Statistical Analysis 
in Psychology 
and Education 



Statistical Analysis 
in Psychology 
and Education 


George A. Ferguson 

Chairman, Department of Psychology, McGill I'niversity 


Second Edition 


International Student Edition 



McGRAW-HILL • LONDON 


New York ■ Sydney ■ Toronto • Mexico • Johannesburg ■ Panama 



STATISTICAL ANALYSIS IN PSYCHOLOGY 
AND EDUCATION 


Printed and bound by MLADINSKA KNJIGA LJUBLJANA YUGOSLAVIA 



Preface 


The object of this book, as of the* edition, is to introduce students 
and research workers in psychology and education to the concepts and 
applications of statistics. Emphasis is placed on the analysis and inter- 
pretation of data resulting from the conduct of experiments. Students 
and investigators in experimental medicine, psychiatry, sociology, and 
other disciplines may also find the book useful. 

This book may be used as a text for either a one-semester or a full- 
year course in statistics. When it is used as a text for a one-semester 
course the instructor may exercise a choice in the selection of material. 
The selection will usually include most of Chaps. 1 to 13 and possibly 
sections in some of the remaining chapters. Different instructors hold 
divergent views with regard to the content of introductory courses in 
statistics. This book has been designed to permit the instructor some 
freedom of choice in the selection of course content. 

I have attempted not only to introduce the student to the practical 
technology of statistics but also to explain in a nonniathematical and fre- 
quently intuitive way the nature of statistical ideas. This is not always 
easy. Obviously, the extent to which an understanding of statistics 
can be communicated without some mathematical knowledge is limited. 
Skill in high school or freshman algebra w’ill prove most helpful to the 
student. 

The writing of a book of this type demands numerous compromises 
between a tidy logical arrangement of material, sound pedagogy, and 
common usage, which are not always compatible. The desire for com- 
pleteness has led to the yiclusion of occasional sections which perhaps 
should not be included in an introductory text. The instructor can 
readily identify these sections and omit them if he chooses. 

In preparing the second edition of this book many changes have 
been made, some important and some trivial; also a number of new 
chapters have been added. The material in the two chapters of the 
first edition, “Essential Ideas of Sampling" and “Tests of Significance," 



viii Preface 


has been replaced by four shorter chapters, “Sampling,” “Estimation,” 
“Tests of Significance: Means,” and “Tests of Significance: Other 
Statistics.” New chapters have been added on the “Analysis of Co- 
variance” and “Trend Analysis.” A chapter on “The Structure and 
Planning of Experiments” has been included; it is intended to provide the 
student with a simple nonmat hematical introduction to experimental 
design as a preliminary to the study of the analysis of variance. Other less 
important additions relate to multiple comparisons, directional versus 
nondirectional tests, and nonparametric methods of trend analysis. 

In the present edition we have chosen to define initially the sample 
variance using N — 1 , and not N as in the first edition. While this has 
advantages in some situations, it occasionally leads to difficulty in 
problems of correlation. In a few instances a definition using N has 
been used for convenience and simplicity. The difference between the 
definition of the sample variance using AT — 1 and that using N is made 
clear at an early stage and should present no difficulty to the student. 

The exercise material in the second edition has been expanded and, 
I hope, improved. 

The usefulness of this book is enhanced by the kindness of authors 
and publishers who have permitted the adaptation and reproductkm of 
tables and other materials published originally by them. I should like 
to express my gratitude to Francis G. Cornell, Allen L. Edwards, R. W. B. 
Jackson, M. G. Kendall, John F. Kenny, Don Lewis, Quinn McNemar, 
Edwin G. Olds, George W. Snedecor, Herbert Sorenson, James E. Wert, 
and Frank Wilcoxon; and to the Scottish Council for Research in Educa- 
tion; the University of London Press Ltd.; Charles Griffin & Company, 
Ltd.; Prentice-Hall, Inc.; John Wiley & Sons, Inc.; D. Van Nostrand 
Company, Inc.; Rinehart & Company, Inc.; Iowa State College Press; 
and the Annals of Mathematical Statistics. 1 am indebted to Messrs. 
Oliver and Boyd, Ltd., Edinburgh, for permission to reprint Tables III, 
IV, and VI of their book Statistical Tables for Biological , Agricultural , 
and Medical Research. 

I should like to express here my indebtedness to the late Sir Godfrey 
H. Thomson and to W. G. Emmett and D. N. Lawley, all of the University 
of Edinburgh. These three are responsible for my persisting interest in 
the applications of statistical method to psychological problems. In 
particular, I should like to express my gratitude to Lady Thomson for 
permission to reproduce certain tables from Sir Godfrey’s work. 

This book has benefited greatly by many constructive criticisms and 
suggestions on the manuscript from Julian C. Stanley of the University 
of Wisconsin and Lyle Jones of the University of North Carolina. 

George A. Ferguson 



Contents 


Prelace vu 

1. Basic Ideas in Statistics 1 

2. Frequency Distributions and Their Graphic 

Representation 25 

3 . Averages 45 

4. Measures of Variation, Skewness, and Kurtosis 61 

5 . Probability and the Binomial Distribution 79 

6 . The Normal Curve 95 

7 . Correlation 105 

8 . Prediction in Relation to Correlation 117 

9. Sampling 132 

10 . Estimation 150 

11 . Tests of Significance: l^eans 161 

12 . Tests of Significance: Other Statistics 176 

13 . Chi Square 191 

14. Rank Correlation Methods ^16 



X 


Contents 


15 . Other Varieties of Correlation 234 

16 . Transformations : Their Nature and Purpose 251 

17 . The Structure and Planning of Experiments 270 

18 . Analysis of Variance: One-way Classification 281 

19 . Analysis of Variance: Two-way Classification 300 

20 . Analysis of Covariance 326 

21 . Trend Analysis 341 

22 . Selected Nonparametric Tests 354 

23 . Errors of Measurement 373 

24 . Partial and Multiple Correlation 388 

Appendix 403 

Glossary of Symbols 431 

References 434 


Index 


437 



Basic Ideas 
in Statistics 


1 .1 Introduction 

This book is concerned with the elementary statistical treatment of 
experimental data in psychology, education, and related disciplines. The 
data resulting from any experiment are usually a collection of observa- 
tions or measurements. The conclusions to be drawn from the experi- 
ment cannot be reliably ascertained by simple direct inspection of the 
data. Classification, summary description, and rules of evidence for 
the drawing of valid inference are required. Statistics provides the 
methodology whereby this can be done. 

Implicit in any experiment is the presumption that it is possible to 
argue validly from the particular to the general and that new knowledge 
can be obtained by the process of inductive inference. The statistician 
does not assume that such arguments can be made with certainty. On 
the contrary, he assumes that some degree of uncertainty must attach to 
all such arguments; that some of the inferences drawn from the data of 
experiments are wrong. He further assumes that the uncertainty itself 
is amenable to precise and rigorous treatment, that it is possible to make 
rigorous statements about the uncertainty which attaches to any par- 
ticular inference. Thus in the uncertain milieu of experimentation he 
applies a rigorous method. 

A knowledge of statistics is an essential part of the training of all 
students in psychology. There are many reasons for this. First , an 
understanding of the modem literature of psychology requires a knowl- 
edge of statistical method and modes of thought. A high proportion of 
current books and journa^ articles either report experimental findings in 
statistical form or present theories or arguments involving statistical 
concepts. These concepts play an increasing role in our thinking about 
psychological problems, quite apart from the treatment of data. The 
student need only consider, for example, the role of statistical concepts 
in current lines of theorizing in the field of learning to grasp the force of 
this argument. Second , training in psychology at an advanced level 



2 


Basic ideas in statistics 


chap, x 


requires that the student himself design and conduct experiments. The 
design of an experiment is inseparable from the statistical treatment of 
the results. Experiments must be designed to enable the treatment of 
results in such a way as to permit clear interpretation, and to fulfill the 
purposes which motivated the experiment in the first place. If the design 
of an experiment is faulty, no amount of statistical manipulation can 
lead to the drawing of valid inferences. Experimental design and 
statistical procedures are two sides of the same coin. Thus not only must 
the advanced student conduct experiments and interpret results, he must 
plan his experiments in such a way that the interpretation of results can 
conform to known rules of scientific evidence. Third , training in sta- 
tistics is training in scientific method. Statistical inference is scientific 
inference, which in turn is inductive inference, the making of general 
statements from the study of particular cases. These terms are for all 
practical purposes, and at a certain level of generality, synonymous. 
Statistics attempts to make induction rigorous. Induction is regarded 
by some scholars as the only way in which new knowledge comes into the 
world. While this statement is debatable, the role in modern society of 
scientific discovery through induction is obviously of the greatest impor- 
tance. For this reason no serious student of psychology, or any^ther 
discipline, can afford not to know something of the rudiments of the 
scientific approach to problems. Statistical procedures and ideas play 
an important role in this approach. 


1.2 

The broad role of quantification in psychology 

While this book is largely concerned with elementary statistical pro- 
cedures and ideas, some mention may be made of the broad role of 
quantitative method in psychology. 

The attempt to quantify has a long and distinguished history in 
experimental psychology, which indeed may be regarded as synonymous 
with the history of that science itself. Since the experimental work in 
psychophysics of E. H. Weber and Gustav Fechner in the nineteenth 
century, determined attempts have been made to develop psychology as 
an experimental science. The early psychoyhysicists were concerned 
with the relationship between the “mind” and the “body” and developed 
certain mathematical functions which they held to be descriptive of that 
relationship. While much of their thinking on the mind-body problem 
has been discarded, their methods and techniques with development and 
elaboration are still used. Shorn of its philosophical and theoretical 
encumbrances, the work of the early psychophysicists was reduced in 



sec. x.a 


The broad role of quantification in psychology 


3 


effect to the study of the relationship between measurements, obtained 
in two different ways, of what were presumed to be the same property. 
Thus, for example, they studied the relationship between weight, length, 
and temperature, defined by the responses of human subjects as instru- 
ments, and weight, length, and temperature, defined by other measur- 
ing instruments, scales, foot rules, and thermometers. A psychophysical 
law, so called, is a statement of the relationship between measurements 
obtained by these two methods. Modern psychophysics is concerned to 
some considerable extent with the scaling of the responses of the human 
subject as instrument and with the use of the human subject as instru- 
ment in dealing with a wide variety of practical problems. It may per- 
haps be referred to as human instrumentation. 

The early psychophysicists invented certain experimental methods 
and developed statistical procedures for handling the data obtained by 
these methods. It is of interest to note that one method, the constant 
process , developed by G. E. Muller and F. M. Urban, has recently, with 
modification, found application in biological-assay work in assessing the 
potency of hormones, toxicants, and drugs of all types. It is currently 
known in biology as the method of probits 

Statistical methods have found extensive application in the psycho- 
logical testing field and in the study of human ability. Since the time of 
Binet, who developed the first extensively used test of intelligence and 
whose thinking was influenced by the early psychophysicists, a compre- 
hensive body of theory and technique has been developed which is pri- 
marily statistical in type. This body of theory and technique is con- 
cerned with the construction of instruments for measuring human 
ability, personality characteristics, attitudes, interests, and many other 
aspects of behavior; with the nature and magnitude of the errors involved 
in such measurement; with the logical conditions which such measuring 
instruments must satisfy; with the quantitative prediction of human 
behavior; and with other related topics. 

The use of psychological tests stimulated the development of the 
techniques of factor analysis, which are used to some extent in con- 
temporary psychology. Problems arise which involve a study of the 
relationships between sets of variables, sometimes as many as 50 or 60 
and perhaps more. Factor analysis attempts to provide a simplified 
description of these relationships, w'hich facilitates an interpretation and 
comprehension of the information in the data. Factor analysis has 
found a number of uses in branches of science other than psychology, 
including meteorology and agriculture. Some of the problems in factor 
analysis have not as yet been fully resolved. 

Within recent years frequent use has been made of statistical con- 



4 


Basic ideas in statistics 


chap, i 


cepts in the construction of models designed to provide some explanation 
and understanding of observable phenomena. Such models are used in 
the field of learning. Further, many biological scientists are currently 
concerned with the construction of models which may possibly bear some 
correspondence to the functioning of certain aspects of the central 
nervous system. While these attempts may be premature and their 
success cannot at this time be evaluated, it is possible that in future the 
models which will prove helpful in understanding the functioning of the 
human brain will either implicitly or explicitly involve statistical con- 
cepts. In a system comprised of a complex network of nerve fibers, the 
transmission of impulses can be conceived in probabilistic terms. 

While the avenues of quantification mentioned above do not fall 
within the context of this book, their study demands a knowledge of 
statistical method and a comprehension of the basic ideas of statistics as 
a starting point. It would seem that as psychology develops, increasing 
emphasis will be placed on quantitative procedure and an increasing 
degree of statistical sophistication will be required of the student. 


1-3 

Statistics as the study of populations 

Statistics is a branch of scientific methodology. It deals with the collec- 
tion, classification, description, and interpretation of data obtained by the 
conduct of surveys and experiments. Its essential purpose is to describe 
and draw inferences about the numerical properties of populations. The 
terms population and numerical property require clarification. 

In everyday language the term population is used to refer to groups or 
aggregates of people. We speak, for example, of the population of the 
United States, or of the state of Texas, or of the city of New York, mean- 
ing by this all the people who occupy defined geographical regions at 
specified times. This, however, is a particular usage of the term popula- 
tion. The statistician employs the term in a more general sense to refer 
not only to defined groups or aggregates of people, but also to defined 
groups or aggregates of animals, objects, materials, measurements, or 
“things” or “happenings” of any kind. Thus the statistician may define, 
for his particular purposes, populations of laboratory animals, trees, nerve 
fibers, liquids, soil, manufactured articles, automobile accidents, micro- 
organisms, birds 1 eggs, insects, or fishes in the sea. On occasion he may 
deal with a population of measurements. By this is meant an indefi- 
nitely large aggregate of measurements which, hypothetically, might be 
obtained under specified experimental conditions. To illustrate, a series 
of measurements might be made of the length of a desk. Some or all of 



sec. 1.3 


Statistics as the study of populations 


5 


these measurements may differ one from another because of the presence 
of errors of measurement. This series of measurements may be regarded 
as part of an indefinitely large aggregate or population of measurements 
which might, hypothetically, be obtained by measuring the length of the 
desk over and over again an indefinitely large number of times. 

The general concept implicit in all these particular uses of the word 
population is that of group or aggregation. The statistician’s concern is 
with properties which are descriptive of the group or aggregation itself 
rather than with properties of particular members. Thus measurements 
may be made of the height and weight of a group of individuals. These 
measurements may be added together and divided by the number of cases 
to obtain the mean height and weigtft. These means describe a property 
uf the group as a whole and are not descriptive of particular individuals. 
To illustrate further, a child may have an IQ of 90 and belong to a high 
socioeconomic group. Another child may have an IQ of 120 and belong 
to a low socioeconomic group. These facts as such about individual 
children do not directly concern the statistician. If, however, questions 
are raised about the proportion of children in a particular population or 
subpopulation wfith IQ’s above or below a specified value, or if more 
general questions are raised about the relationship between intelligence 
and socioeconomic level, then these are questions of a statistical nature, 
and the statistician has techniques which assist their exploration. 

The distinction is sometimes made between finite and infinite popula- 
tions. The children attending school in the city of Chicago, the inmates 
of penitentiaries in Ontario, the cards in a deck are examples of finite 
populations. Tin members of such a population can presumably be 
counted, and a finite number obtained. The possible rolls of a die and 
the possible observations in many scientific experiments are examples of 
infinite or indefinitely large populations. The number of rolls of a die or 
the number of scientific observations may, at least theoretically, be 
increased without any finite limit. In many situations the populations 
which the statistician proposes to describe are finite, but so large that 
for all practical purposes they may be regarded as infinite. The 200 
million or so people living in the United States constitute a large but 
finite population. This population is so large that for many types of 
statistical inference it may be assumed to be infinite. This would not 
apply to the cards in a decll, which may be thought of as a small finite 
population of 52 members. 

Most populations are comprised of naturally distinguishable mem- 
bers, as is, of course, the case with people, animals, measurements, or 
the rolls of a die. Some populations are not so comprised, as is the case 
with liquids, soils, woven fabrics, or, for that matter, human behavior. 



6 


Basic ideas in statistics 


chap, z 


How is it possible to apply the concept of group or aggregation to popula- 
tions of this latter type? This may be done by defining the population 
member arbitrarily as a liter, a cubic centimeter, a square yard, or some 
such unit. The whole population may be thought to be composed of an 
aggregate of such members. Likewise, in the study of human behavior, 
the psychologist frequently concerns himself with arbitrarily defined bits 
of behavior, although behavior as such may perhaps be regarded as a con- 
tinuous flow’ or sequence. 

Statistics is concerned with the numerical properties of populations, 
that is, with properties to which numerals can in some manner be 
assigned. The logical implications of the term numerical property are 
complex and need not be elaborated here. To illustrate briefly, however, 
in any population of mental-hospital patients some may be classed as 
psychoneurotic, others as schizophrenic psychotic, others as psychotic 
with organic brain disease, and so on. Further, some patients may come 
from broken homes, while others may have a normal healthy home back- 
ground. Some may have a history of mental disease in the family, and 
others may not. We may be said to apply a statistical method when we 
concern ourselves with how many patients in the population fall within 
these various classes, that is, how many are psychoneurotic, schizo- 
phrenic psychotic, and the like, and how many come from broken homes, 
how many do not, and so on. Further, the flicker fusion rates of some 
part or all of the population may be measured and attention directed to 
the numbers of patients who fall within specified ranges of flicker fusion 
rate, to mean rates for various classes of patients, and to related problems. 
The investigation of such problems as these may be said to involve 
a statistical method. In general, the statistician's concern is with those 
properties of populations which can be expressed in numerical form. 


M 

Statistics as the study of variation 

Statistics is sometimes conceptualized as the study of variation, because 
it provides a technology for the exploration of variation in the events of 
nature and for the making of inferences about the causal circumstances 
which underlie that variation. Emphasis on the study of variation 
originated with Darwin in The Origin ofl'Specics (1859). Variation 
was a central concept in the theory of natural selection because evolution 
could not occur without it. In Darwin's words, 

The many slight differences which appear in the offspring from 
the same parent . . . may be called individual differences. . . . 
These individual differences are of the highest importance for us, 



sec. 1.4 


Statistics as the study of variation 


7 


for they are often inherited, as must be familiar to everyone; and 
they thus afford materials for natural selection to act on and 
accumulate. 

The matter is more clearly stated in an editorial in the first issue of the 
journal Biometrika (1901), probably written by Karl Pearson: 

The starting point of Darwin’s theory of evolution is precisely the 
existence of those differences between individual members of a 
race or species which morphologists for the most part rightly 
neglect. The first condition necessary, in order that any process 
of Natural Selection may begin among a race, or species, is the 
existence of differences among its members; and the first step in an 
enquiry into the possible effect of a selective process upon any 
character of a race must be an estimate of the frequency with 
which individuals, exhibiting anv given degree of abnormality with 
respect to that character, occur. The unit, with which such an 
enquiry must deal, is not an individual but a race, 01 a statistically 
representative sample of a rare, and the result must take the form 
of a numerical statement, showing the relative frequency with 
which the various kinds of individuals composing the race occur. 

Darwin made no direct contribution to statistical method. He did, 
however, create a theoretical context, based on observation and report, 
which made the study of variation meaningful and required, as it were, 
the development of statistical methods for its rigorous study. Darwin's 
disciple Galton understood fully the concept of variation. He was 
responsible for the initial applications of the so-called “normal” curve, or 
distribution, in psychological enquiiy and made important contributions 
to the development of methods of correlation. He greatly influenced 
Karl Pearson, his disciple and biographer. Pearson perceived his role 
as that of helping to build a mathematical basis for e\olutionary theory; 
he did, in fact, build the foundations of modern statistics. Between 
1894 and 1916 he published 19 papers and monographs on statistical 
subjects, some of great importance and comprehensiveness. All these 
papers were titled Contributions to the Mathematical Theory of Evolution . 

Despite influences from many sources, modern statistics is largely 
a direct emergent of the biological revolution of the nineteenth century 
which Darwin helped to rieate. The central concept in evolutionary 
theory which gave rise to this line of development was the concept of 
variation. Since Pearson the development of statistical method has 
been closely associated with the attempt to find solutions to biological 
problems. R. A. Fisher, who developed the analysis of variance and 
made contributions between 1920 and 1960 which exceeded those of any 



8 


Basic ideas in statistics 


chap, x 


other living person, devoted his life primarily to statistical problems of 
experimentation in the biological sciences and to the mathematical 
foundations of genetics. 


i-5 

Samples and sampling 

Because of the large size of many populations, it may be either impractica- 
ble or impossible for the investigator to produce statistics leased on all 
members. If, for example, interest is in investigating the attitudes of 
adult Canadians toward immigrants, it would obviously be a prohibitively 
expensive and time-consuming task to measure the attitudes of all adult 
Canadians and produce statistics based on a study of the complete 
population. If a population is indefinitely large, it is of course impossible. 
ipso facto , to produce complete population statistics. Under circum- 
stances such as these the investigator draws what is spoken of as a sample. 
A sample is any subgroup or subaggregate drawn by some appropriate 
method from a population, the method used in drawing the sample being 
important. Methods used in drawing samples will he discussed in later 
chapters of this book. Having drawn his sample, the investigator utilizes 
appropriate statistical methods to describe its properties lie then 
proceeds to make statements about the properties of the population from 
his knowledge of the properties of the sample; that is, he proceeds to 
generalize from the sample to the population. To return to the example 
above, an investigator might draw a sample of 1,000 adult Canadians, 
the term adult being assigned a precise meaning, measure their attitudes 
toward immigrants using an acceptable technique of measurement, and 
calculate the required statistics. Questions may then be raised about the 
attitudes of all adult Canadians from the information obtained from a 
study of the sample of 1,000. 

The fact that inferences (‘an be made about the properties of popula- 
tions from a knowledge of the properties of samples is basic in research 
thinking. Such statements are of course subject to error The mag- 
nitude of the error involved in drawing such inferences can, however, in 
most cases be estimated by appropriate procedures. Where no estimate 
of error of any kind can be made, generalizations about populations from 
sample data are worthless. 

Information about properties of particular samples quite apart from 
any generalizations about the population, is of little intrinsic interest in 
itself. Consider a case where the investigator’s interest is in the relative 
effects of two types of psychotherapy when applied to patients suffering 
from a particular mental disorder. He may select two samples of 



sec. 1.5 


Samples and sampling 


9 


patients, apply one type of treatment to one sample and the other type of 
treatment to the other sample, and collect data on the relative rates of 
recovery of patients in the two samples. Clearly, in this ease his interest 
is in finding out whether the one treatment is better or worse than the 
other when applied to the whole class of patients suffering from the mental 
disorder in question. He is interested in the sample data only in so far 
as these data enable him to draw inferences with some acceptable degree 
of assurance about this general question. His experimental procedures 
must be designed to enable the drawing of such inferences, otherwise the 
experiment serves no purpose. On occasion research reports arc found 
where the investigator states that the experimental results obtained 
should not be generalized beyond the particular sample of individuals who 
participated m the study The adoption of this view means that the 
investigator has missed the essential nature of experimentation. Unless 
the intention is to generalize from a sample to a population, unless the 
procedures used are such as to enable such generalizations justifiably 
to be made, and unless some estimate of error can be obtained, the con- 
duct of experiments is without point. 

Statistical procedures used in describing the properties of samples, 
or of populations where complete population data are available, are 
referred to by some writers as descriptive statistics. If we measure the 
IQ of the complete population of students in a particular university and 
compute the mean IQ, that mean is a descriptive statistic because it 
describes a characteristic of the complete population. If, on the other 
hand, we measure the JQ of a sample of 100 students and compute the 
mean IQ for the sample, that mean is also a descriptive statisti'% because 
it describes a ch. actcristic of that sample. 

Statistical procedures used in the drawing of inferences about the 
properties of populations *rom sample data arc frequently referred to as 
sampling statistics. If, fo r example we wish to make a statement about 
the mean IQ m the complete population of students in a particular uni- 
versity from a knowledge of the mean computed on the sample of 100 
and estimate the error involved in this statement, we use procedures from 
sampling statistics. The appli ation of these procedures provides 
information about the accuracy of the sample mean as an estimate of the 
population mean; that is, u indicates the degree of assurance we may 
place in the inferences we draw from the sample to the population. 

While the distinction between descriptive and sampling statistics is a 
useful one, it may be emphasized that the ultimate object of statistical 
method is the making of statements about populations. A mean calcu- 
lated on a sample provides information about the population from which 
the sample is drawn, although in any particular instance the information 



zo 


Basic ideas in statistics 


chap, z 


may be very inaccurate. The ultimate intent is in all instances to 
find things out about populations. Most statistical methods, whether 
referred to as descriptive or sampling methods, are means to this end. 

In this section no discussion is advanced on methods of drawing 
samples or the conditions which these methods must satisfy to allow the 
drawing of valid inferences from the sample to the population. Further, 
no precise meaning has been assigned to the term error. These topics will 
be elaborated at a later stage. 


1*6 

Parameters and estimates 

A clear distinction is usually drawn between parameters and estimates. 
A \ parameter is a property descriptive of the population. The term 
estimate refers to a property of a sample drawn at random from a popula- 
tion. The sample value is presumed to be an estimate of a corresponding 
population parameter. Suppose, for example, that a sample of 1,000 
adult male Canadians of a given age range is drawn from the total 
population, the height of the members of the sample measured, and a 
mean value, 68.972 in., obtained. This value is an estimate of the 
population parameter which would have been obtained had it been 
possible to measure all the members in the population. Usually parame- 
ters or population values are unknown. We estimate them from our 
sample values. The distinction between parameter and estimate reflects 
itself in statistical notation. A widely used convention in notation is to 
employ Greek letters to represent parameters and Roman letters to 
represent estimates. Thus the symbol <t ) the Greek letter sigma, may be 
used to represent the standard deviation in the population, the standard 
deviation being a commonly used measure of variability. The symbol s 
may be used as an estimate of the parameter a. This convention in 
notation is applicable only within broad limits. By and large we shall 
adhere to this convention in this book, although in certain instances it will 
be necessary to depart from it. By common practice and tradition a 
Greek letter may be used on occasion to denote a sample statistic. 


i-7 

Variables and their classification 

The term variable refers to a property whereby the members of a group or 
set differ one from another. The members of a group may be individuals 
and may be found to differ in sex, age, eye color, intelligence, auditory 
acuity, reaction time to a stimulus, attitudes toward a political issue, and 



sec. 1.7 


Variables and their classification 


II 


many other ways. Such properties are variables. The term constant 
refers to a property whereby the members of a group do not differ one from 
another. In a sense a constant is a particular type of variable; it is a 
variable which does not vary from one member of a group to another or 
within a particular set of defined conditions. 

Labels or numerals may be used to describe the way in which one 
member of a group is the same as or different from another. With varia- 
bles like sex, racial origin, religious affiliation, and occupation, labels are 
employed to identify the members which fall within particular classes. 
An individual may be classified as male or female; of English, French, or 
Dutch racial origin; Protestant or Catholic; a shoemaker or a farmer; and 
so on. The label identifies the class to which the individual belongs. 
Sex for most practical purposes is a two-valued variable, individuals 
being either male or female. Occupation, on the other hand, is a multi- 
valued variable. Any particular individual may be assigned to any one 
of a large number of classes. With variables like height, weight, intel- 
ligence, and so on, measuring operations may be employed which enable 
the assignment of descriptive numerical values. An individual may be 
72 in. tall, weigh 11)0 lb, and have an IQ of 90. 

The particular values of a variable are referred to as variates , or 
variate values. To illustrate, in considering the height of adult males, 
height is the variable, whereas the height of any particular individual is a 
variate, or variate value. 

In dealing with variables which bear a functional relationship one to 
another the distinction may be drawn between dependent and independent 
variables. Consider the expression 

y = f(X) 

This expression says that a g : ven variable Y is some unspecified function 
of another variable X. The symbol f is used generally to express the 
fact that a functional relationship exists, although the precise nature of 
the relationship is not stated. I 11 any particular case the nature of the 
relationship may be known ; that is, we may know precisely what / means. 
Under these circumstances, for any given value of X a corresponding 
value of Y can be calculated ; that is, given X and a knowledge of the 
functional relationship, Y can ‘>e predicted. It is customary to speak of 
Y } the predicted variable, *as the dependent variable because the pre- 
diction of it depends on the value of X and the known functional relation- 
ship, whereas A" is spoken of as the independent variable. Given an 
expression .of the kind 1 = X 3 for any given value of X, an exact value of 
Y can reudily be determined. Thus if A" is known, Y is also known 
exactly. Many of the functional relationships found in statistics permit 



12 


Basic ideas in statistics 


chap, i 


probabilistic and not exact prediction to occur. Such relationships may 
provide the most probable value of Y for any given value of X, but do not 
permit the making of perfect predictions. 

A distinction may be drawn between continuous and discrete (or 
discontinuous) variables. A continuous variable may take any value 
within a defined range of values. The possible values of the variable 
belong to a continuous series. Between any two values of the variable an 
indefinitely large number of in-between values may occur. Height, 
weight, and chronological time are examples of continuous variables. A 
discontinuous or discrete variable can take specific values only. Size of 
family is a discontinuous variable. A family may be comprised of 1, 2, 3 
or more children, but values between these numbers are not possible 
The values obtained in rolling a die are 1, 2, 3, 4, 5, and 0. Values 
between these numbers are not possible. Although the underlying 
variable may be continuous, all sets of real data in practice are discon- 
tinuous or discrete. Convenience and errors of measurement impose 
restrictions on the refinement of the measurement employed. 

Another classification of variables is possible which is of some impor- 
tance and is of particular interest to statisticians. This classification is 
based on differences in the type of information which different operations 
of classification or measurement yield. To illustrate, consider the follow- 
ing situations. An observer using direct inspection may rank order a 
group of individuals from the tallest to the shortest according to height 
On the other hand, he may use a foot rule and record the height of each 
individual in the group in feet and inches. These two operations are 
clearly different, and the nature of the information obtained by applying 
the two operations is different. The former operation permits state- 
ments of the kind: individual A is taller or shorter than individual B 
The latter operation permits statements of how much taller or shorter one 
individual is than another. Differences along these lines serve as a basis 
for a classification of variables, the class to which a variable belongs being 
determined by the nature of the information made available by the meas- 
uring operation used to define the variable. Four broad classes of varia- 
bles may be identified. These are referred to as (1) nominal, (2) ordinal, 
(3) interval, and (4) ratio variables. This classification is discussed in 
some detail by Stevens (19.31, ("hap. 1). A very interesting discussion 
relevant to this topic is given in Torgerson (1958). 

A nominal variable is a property of the members of a group defined by 
an operation which permits the making of statements only of equality or 
difference. Thus we may state that one member is the same as or different 
from another member with respect to the property in question. State- 
ments about the ordering of members, or the equality of differences 



sec. 1.7 


Variables and their classification 


13 


between members, or the number of times a particular member is greater 
than or less than another are not possible. To illustrate, individuals may 
be classified by the color of their eyes. Color is a nominal variable. The 
statement that an individual with blue eyes is in some sense “greater 
than” or “less than” an individual with brown eyes is meaningless. 
Likewise the statement that the difference between blue eyes and brown 
eyes is equal to the difference between brown eyes and green eyes is 
meaningless. The only kind of meaningful statement possible with the 
information available is that the eye color of one individual is the same 
as or different from the eve color of another. A nominal variable may 
perhaps be viewed as a primitive type of variable, and the operations 
whereby the members of a group areVlassified according to such a variable 
constitute a primitive form of measurement. In dealing writh nominal 
variables numerals maybe assigned to represent classes, but such numerals 
arc labels, and the only purpose they serve is to identify the members 
within a given class. 

An ordinal variable is a property defined by an operation which 
permit'- the rank ordering of the members of a group; that is, not only 
are statements of equality and difference possible, but also statements of 
the kind greater Oiari or less than. Statements about the equality of 
differences between members or the number of times one member is 
greater than or less than another are not possible. If a judge is required 
to order a group of individuals according to aggressiveness, or coopera- 
tivoness, or some other quality, the resulting variable is ordinal in type. 
Many of the variables used in psychology are ordinal. 

An interval inriable is a property defined by an operation which 
permits the making statements of equality of intervals, in addition to 
statements of sanuness or difference or greater than or less than. An 
interval variable does not have a “true” zero point, although a zero point 
may for eoinememe be arbiiianly defined. Fahrenheit and centigrade 
temperature measurements constitute interval variables Consider 
three objects, A , li, and C, with temperatures 12°, 24°, and 36°, respec- 
tively. It is appropriate to say that the difference between the tem- 
perature of A and B is equal to the difference in the temperature of B and 
C It is appropriate also to say that the difference between the tem- 
perature of .1 and V is twn « the difference between the temperature 
of A and B or B and C. ft is not appropriate to say that B has twice 
the temperature of A, or that C has three times the temperature of 
A . In common usage, if the temperature yesterday was f>4° and today it 
was 32°, v e would not say that it was twici as hot yesterday, or that the 
temperature w r as twice as great, as it was today. Calendar time is also 
an interval variable with an arbitrarily defined zero. 



14 


Basic ideas in statistics 


chap, i 


A ratio variable is a property defined by an operation which permits 
the making of statements of equality of ratios in addition to all other 
kinds of statements discussed above. This means that one variate value, 
or measurement, may be spoken of as double or triple another, and so on. 
An absolute zero is always implied. The numbers used represent dis- 
tances from a natural origin. Length, weight, and the numerosity of 
aggregates are examples of ratio variables. One object may be twice as 
long as another, or three times as heavy, or four times as numerous. 
Many of the variables used in the physical sciences are of the ratio type. 
In psychological work, variables which conform to the requirements of 
ratio variables are uncommon. Scales for measuring loudness, pitch, 
and other variables have been developed by Stevens (1957) at Harvard. 
These appear to satisfy all the conditions of ratio variables. 

The essential difference between a ratio and an interval variable is 
that for the former the measurements are made from a true zero point, 
whereas for the latter the measurements are made from an arbitrarily 
defined zero point or origin. Because of this, for a ratio variable ratios 
may be formed directly from the variate values themselves, and meaning- 
fully interpreted. For an interval variable, ratios may be formed from 
differences between the variate values. The differences constitute a 
ratio variable, because the process of subtraction eliminates}, or cancels 
out, the arbitrary origin. Differences are the same regardless of the 
location of the zero or origin. 

A variety of refinements and elaborations can be made on the dis- 
tinction between nominal, ordinal, interval, and ratio variables. For a 
more detailed discussion the reader is referred to Stevens (1951); Thrall, 
Coombs, and Davis (1954); and Torgerson (1958). 

Some writers distinguish between quantitative and qualitative 
variables without being explicit about the nature of this distinction. In 
the present classificatory system nominal and ordinal vanables may be 
spoken of as qualitative, and interval and ratio variables as quantitative. 

Statistical methods exist for the analysis of data composed of 
nominal variables, ordinal variables, and interval and ratio variables. 
From the viewpoint of practical statistical work in psychology and 
education the distinction between interval and ratio variables is perhaps 
unimportant, and it is convenient to think of three, and not four, classes 
of variables, with three corresponding classes <of statistical method. Pro- 
cedures for the analysis of interval and ratio variables constitute by far 
the largest, and most important, class of statistical method. 

In practice we frequently apply methods appropriate to one class of 
variable in the statistical analysis of other classes of variables. This 
means that we either discard information which we do in fact possess or 



sec. 1.7 


Variables and their classification 


IS 


assume that we have information which we do not possess. An exam- 
ple of the former situation arises where measurements of the interval or 
ratio type are replaced by ranks for purposes of analysis. Measurements 
of height or weight for a group of N subjects may be replaced by the 
ranks 1, 2, 3, . . . , N> and subsequent analysis based on these ranks. 
Further measurements may be divided into broad classes, say top third, 
middle third, and bottom third, and treated as a nominal variable. A 
currently popular class of statistical method is called nonparametric 
statistics (see Chap. 22). Many nonparametric statistical procedures 
convert problems involving interval and ratio variables to problems that 
involve a consideration of either nominal categories or ranks. 

In the analysis of statistical data in psychology and education the 
investigator not infrequently assumes that he has information which 
actually he does not have. Variables which are in fact ordinal may be 
floated by a method appropriate for interval and ratio variables. An 
example of this situation arises when the members of a group are ordered 
with regard to some property. The information consists of relations of 
“greater than” or “less than,” and these are described by a set of ordinal 
numbers; thus one member is first, another second, and so on. It is 
common practice to replace such a set of ordinal numbers by the cor- 
responding set of cardinal numbers, 1, 2, 3, ... , iV, and to proceed to 
apply arithmetical operations to these numbers. This means that cer- 
tain assumptions are made. Information is superimposed on the data 
which the measuring operation did not yield; that is, for computational 
purposes we assume we are in possession of information which actually 
we do not have. In the above instance we are making an assumption 
about the equality of intervals when in fact the measuring operation 
employed does not yield information of this kind. The assumption is 
that the difference between the first and second individual is equal to the 
difference between the second and third, and so on. 

In psychological work many variables are in fact ordinal, although for 
statistical purposes they are, quite justifiably, commonly treated as if they 
were interval or ratio variables. For example, scores on intelligence 
tests, scholastic-aptitude tests, attitude tests, personality tests, and the 
like, are in effect ordinal variables, although they are commonly treated as 
if they vere of the interval o. ratio type. No aspect of th*> operation of 
measuring intelligence, lefr us say, is such as to permit the making of 
meaningful statements about the equality of intervals or ratios. We 
cannot say that the difference in intelligence between a person with an 
IQ of 80 md one with an IQ of 90 is in any sense equal to the difference in 
intelligen e between a person with an IQ of 110 and one with an IQ of 
120. Nor can we meaningfully assert that a person with an IQ of 120 is 



Basic ideas in statistics 


chap, i 


16 


twice as intelligent as a person with an IQ of 60. Such statements are 
without meaning. Despite this, IQ's are commonly treated by statistical 
methods which, from a rigorously logical viewpoint, are appropriate only 
to interval and ratio variables. The suggestion is not made here that 
the practice of assuming that we have information we do not have, or the 
converse practice of discarding information we do in fact have, be 
discontinued, although a logical purist might be led to this position. 
Frequently practical necessity dictates a particular procedure. Neverthe- 
less it is a matter of some importance to know the nature of the informa- 
tion contained in the data. We should be able to distinguish clearly 
between this and the information either imposed or discarded for the 
purpose of making some process of calculation possible. In other words, 
our understanding of precisely what we are doing is enriched by knowing 
the nature of the assumptions made at each stage in the application of 
any procedure. 


1.8 

Experimental and correlational investigations 

Much scientific enquiry is concerned with an exploration of the relations 
between variables. Some investigations involve a study of the relations 
between many variables; others, of the relation between two variables 
only, and are of the general form Y = f(X) To illustrate the two-varia- 
ble situation, Y may be a measure of motor performance: and A\ four 
different dosages of alcohol, each dosage administered to 10 different 
individuals. V may be a measure of intensity of belief in the prevalence 
of witches; and X, three specified times before, during, and after Hal- 
loween. Y may be average first-year marks in a university; and A", 
scores on a scholastic-aptitude test. Again } r may be the presence or 
absence of cancer of the lung at age 50; and A r , some index or measure of 
the amount of cigarette smoking. In these examples the investigator is 
concerned with an examination of the nature of the relation between two 
variables. The reader should note here that both Y and A' may be 
nominal, ordinal, interval, or ratio variables. Thus both > and X may 
be nominal, Y may be nominal and X ordinal or interval, and so on. 

A useful and, indeed, important distinction may be made between 
experimental and correlational investigations*: In an experiment the 
values of the X variable, and the frequency of occurrence of these values, 
are fixed by the investigator. In the illustrative example on the relation 
between motor performance and alcohol, the investigator determines the 
number of dosages, the amount of each dosage, and the number of 
experimental subjects receiving each dosage. If the experiment were 



sec. 1.9 


On calculating 


17 


repeated to check the results, the same dosage of alcohol would be used. 
In a correlational study the particular values of the variable, and the 
frequency of their occurrence, are not fixed, or controlled, by the investi- 
gator. In the example on the relation between first-year averages in 
university and scholastic-aptitude test scores, the investigator may draw 
a sample of students for whom both first-year averages and scholastic- 
aptitude test scores are available. He may then proceed to study the 
relation between these two variables. He exerts no control over the 
magnitude of particular scholastic-aptitude test scores or the frequency of 
their occurrence. If the investigation were repeated, another sample of 
subjects would be used, and the particular scholastic-aptitude test 
scores, and the frequency of their occurrence, might be expected to differ 
m some degree from that found in the first enquiry. 

hi psychology and education use is made of both experimental and 
correlational studies. The experimentalist is himself the ereator of 
variation. The correlation ist studies the variation which already 
exists in nature. Oronbach (19.77) has examined in detail the relation 
between the experimental and correlational disciplines in psychology. 
He writes, 

The wdl known virtue of the experimental method is that it brings 
situational variables under tight control It thus permits rigorous 
tests of hypotheses and confident statements about causation 
The eorrelational method, for its part, can stud\ what man has 
not learned to control. Nature has been experimenting since the 
beginning of time, with a boldness and complexity tar beyond the 
resources of science. The correlator’s mission is to observe and 
urge ize *he data of nature’s experiments. 


1.9 

On calculating 

If possible, skill in the operation of a calculating machine should be 
acquired at. an early stage in the study of statistics Many of the .-tatis- 
tical problems which present themselves in experimental work in psy- 
chology involve much computation, and without a calculator, or ready 
access to a digital computer, mK* arithmetical labor is prohibitive. Skill 
in the operation of a calculator can be readily acquired, a reasonable level 
of performance on simple operations being attained by most students in 
a few hours of practice. Not only can the simple arithmetical operations 
of addition, subtraction, multiplication, and division be rapidly performed 
on many of the widely used calculating machines, but also many short 
cuts and combinations of operations are possible. For example, the sum 



i8 


Basic ideas in statistics 


chap, i 


of products of two sets of variate values 2 A !T may be accumulated. The 
value of the term 2.XY is required in the calculation of the correlation 
coefficient, a statistic which provides a measure of the relationship 
between two variables. Statistical procedures can frequently be adapted 
to suit the capabilities of particular machines. A valuable aid in com- 
puting is Harlow's ' Tables (Comrie, 1947). These tables were originally 
prepared by Peter Barlow at the Royal Military Academy, Woolwich, 
and first published in 1814. The 1941 edition of the Tables provides the 
square, cubes, square roots, cube roots, and reciprocal of all integers up 
to 12,500. 

In computing, the importance of adequate checks on the accuracy of 
the calculation cannot be too emphatically stressed. Every calculation 
should be checked either by repetition or by the employment of some 
checking device which guarantees accuracy. There is no substitute for 
accuracy. The conduct of an experiment serves no purpose unless cor- 
rect inferences arc drawn from the data. The correctness of the infer- 
ences drawn cannot be assured unless the statistical procedures employed 
are appropriate to the data and unless these procedures are accurately 
applied. Students not infrequently feel that the statistical analysis of a 
set of data is laborious and time-consuming and in their haste to arrive at 
some kind of result may disregard checks which are necessary to ensure 
the accuracy of their calculations. When tempted m this direction, the 
student should observe that the time spoilt m the proper statistical 
analysis of a set of data represents in most instances a small proportion 
of the time required to plan the experiment and gather the data. A slip- 
shod analysis may throw in jeopardy the total investment of time and 
effort. 

Many complex forms of statistical analysis, which previously 
involved prohibitive amounts of calculation, an* now rapidly performed 
on digital computers. The availability of such computers has expanded 
greatly the applications of statistics. 

I.IO 

Units of measurement 

When dealing with continuous variables a unit of measurement may be 
regarded as any defined subdivision of a scale* however fine. In measur- 
ing length the units may be inches, yards, and miles or centimeters, 
meters, and kilometers. In measuring weight the units may be ounces 
and pounds or grams and kilograms. In measuring chronological time 
the units may be seconds, minutes, hours, days, months, or years. 

With continuous variables, although all values are theoretically 



sec. i. io 


Units of measurement 


IQ 


possible within any range of values, we select a unit of measurement and 
record our observations as discrete values. All experimental observa- 
tions, however obtained, are recorded as discrete values. Thus the 
length of a desk or the height of a man may be measured to the nearest 
inch, or tenth of an inch, or hundredth of an inch, the unit of measurement 
in each case being 1 in., ^ in., or ^ in., respectively, and the number of 
such units involved in any particular measurement must, of necessity, be 
recorded as a discrete number. 

The fineness of the unit of measurement employed is determined by 
the accuracy which the nature of the situation demands or by the accu- 
racy which the instrument of measurement allows, or both. In the 
measurement of time intervals, for example, great accuracy can be 
obtained by the use of appropriate measuring devices. In measuring 
the time required for a child to solve a problem it is certainly adequate 
for all practical purposes to record the observation in seconds. In 
reaction-time experiments, however, we may require a unit of measure- 
ment of a hundredth or perhaps a thousandth part of a second. Further, 
the unit should reflect the accuracy of the measuring operation. To 
illustrate, an intelligence quotient is calculated by dividing mental age 
by chronological age, both expressed in months, and multiplying by 100. 
Quite clearly, we could speak of a child's intelligence quotient as being 
103.3, or 103.23, or something of the sort. Such an attempt at accuracy 
would be spurious because of the large error of measurement which is 
known to attach to the intelligence quotient. In practice, intelligence 
quotients are always recorded to the nearest whole number. 

When we record measurements of a continuous variable as discrete 
numbers in so manv units, we imply in most cases that had a more 
accurate form of measurement been used, were this possible and desira- 
ble, the value thereby obtained would fall within certain limits, these 
limits being defined as one-ha : f a unit above and below* the value reported. 
Thus when we report a measurement to the nearest inch, say, 20 in., 
this is assumed to mean that the observation falls within the limits 25.5 
and 26.5, or more precisely that it is greater than or equal to 25.5 and less 
than 26.5. Likewise, a measurement made to the nearest tenth part 
of an inch, say, 31.7, is assumed to fall within the limits 31.65 and 31.75. 
In a reaction-time experimei * n particular observation measured to the 
nearest thousandth of a second might be, say, .196 sec. This means that 
the measurement is taken as falling within the limits .1955 and .1965 sec. 

An exception to the above is age. W r hen we state that a person is 
18 years old, we do not mean in conventional usage that his age falls 
within thv limits 17 years 6 months and 18 years 6 months. A person is 
ordinarily spoken of as 18 years old until his 19th birthday. His age is 



22 


Basic ideas in statistics 


chap, i 


(AT — l)/2 such pairs, plus the middle term, which is equal to ( N + l)/2. 
The sum of the series is then 

(N -_1) (N+l) + (N + 1) _ N(N + 1) 

if 

An expression frequently encountered in statistics is £ X % Y % . This 

* » l 

refers to the sum of the products of two sets of paired numbers. If, for 
example, 5, 6, 12, 15 are the scores, X , of four people on a test, and 2, 3, 7, 

N 

10 are the scores, F, of the same four people on another test, then £ X x Y % 

1 = 1 

refers to the sum of products and is equal to 5 X 2 + 6 X 3 + 12 X 7 + 
15 X 10, or 262. 

The notation used in elementary statistics is simple, and skill in its 
manipulation can be acquired with a little practice. A good understand- 
ing of the nature of statistical method and its applications can be acquired 
with very little in the way of mathematical training at all. A little 
knowledge of arithmetic and a little elementary algebra go a long way m 
the study of statistics. 

EXERCISES 

1 Indicate with examples the differences between (a) population and 
sample, ( b ) finite and infinite populations, (c) descriptive and sam- 
pling statistics, (d) parameters and estimates, (e) dependent and inde- 
pendent variables, (f ) continuous and discrete variables, (g) experi- 
ments and correlational studies. 

2 Classify the following as nominal, ordinal, interval, or ratio variables: 
(a) height, (6) weight, (c) examination marks, ( d ) sex, (e) eye color, 
(J) calendar time, ( g ) age, ( h ) racial origin, (i) temperature, (j) ratings 
of scholastic success. 

3 Write the following in summation notation: 

a X x + X 2 + • • ■ + X L6 

b Fi+F 2 + * * + F^ 

C (Xx + Y\) + (X 2 + Ft) + • * • + (X 7 + Ft) 
d XxYx + X 2 Y 2 + ■ ■ ■ + XyYjy 
e XSYx + X 2 *Y 2 + • • • + X n *Y n 

i (Xx + c) + (X 2 + c) + • • • + (X s -t c) 

g cXi + cX 2 + • * + cX 2 5 

h X\/c + X 2 /c + • • • + Xx/c 
i cXx*Y x + cXJYx + • • ■ + cX„*Y n 



Exercises for chapter i 


4 Write each of the following in full 


5 Show that 


a y X. d c 2 X * Y ' 

a«l i- 1 

3 4 

b T X.y, e £ X. + 4c 

t^l * = 1 

5 5 5 

c y (x. + y.) f y x, + c y y, 

r I-* +4 

i-l »»1 t-1 

N \ V 

y (x. + c y = y* A',* + 2c y x, + at*-* 


6 Consider the follow mg paired observations 


Calculate 


x, - n 

y, - 9 

x* » 5 

y* = 2 

x, - r> 

>’1 - 1 

X 4 - 4 

y 4 = 0 

X, = 8 

>\ = a 

a IX, 2 

f S(X, 

b sy, 2 

g 2( r »X, 


c 2(X. - .'»)* 
d s(y, - c" 
e 2X,y, 


h six. - ry 

i 25(A\/J\) 


In all instances the ^animation is under dood to extend over the five 
paired observations. 

7 Which of the billowing an true and wh eh arc false? 

« i x - i y ' = i x ' y ' 

ti »— i *«-i 

>> d x -Y * i ** 

t~l *-l 

c Y (X, c)(X, - r) - Y X s - Xc 2 

. i <- 1 

d Y (X, + v.y = j x . 2 + £ y,* + 2 2 ) x.r, 

1 ■ 1 »-l i-l « — 1 

8 What is the sum of the first 100 integers? 



24 


Basic ideas in statistics 


chap, i 


9 If Xi = 4, X 2 - 6 , Xi = 3, and X 4 = 7, obtain the following- 
a j (X,* - 4X.) b j (X.» - X,* + X.) 

10 Obtain 

4 N V 

t>S r C S 2 

11 Show that 

a j f(X, - X) a + X,(X - 1)] = J X,* - NX 
b f fX.(X. - X) + X*J - f X. J 



Frequency Distributions 

and Their 
Graphic Representation 



.1 Introduction 


The data obtained from the conduct of experiments or correlational 
studies are very frequently collections of numbers. Classification and 
description of these numbers are required to assist interpretation. Under 
certain circumstances advantages attach to the classification of the data 
in the form of frequency distributions. Such classification may help the 


Table 2.1* 

Intelligence quotients made by zoo pupils on a mental test 


109 

111 

82 

105 

134 

113 

90 

79 

100 

117 

80 

90 

121 

75 

93 

99 

90 

92 

96 

82 

101 

104 

80 

81 

83 

104 

93 

109 

72 

110 

111 

91 

109 

111 

81 

122 

83 

92 

101 

77 

99 

103 

93 

91 

67 

108 

93 

84 

84 

100 

102 

84 

96 

89 

81 

107 

95 

91 

107 

102 

109 

93 

82 

103 

116 

86 

78 

73 

104 

104 

103 

108 

'6 

94 

108 

72 

87 

121 

80 

127 

105 

m 

106 

119 

90 

93 

89 

110 

103 

100 

99 

79 

117 

134 

117 

93 

82 

98 

89 

119 



26 


Frequency distributions and their graphic representation chap. 2 


investigator to understand the nature of important features of the data 
and may possibly reduce arithmetical labor in calculating certain statis- 
tics. A frequency distribution is an arrangement of the data that shows 
the frequency of occurrence of the different values of the variable or the 
frequency of occurrence of values falling within arbitrarily defined inter- 
vals of the variable. This latter statement will become clear as we 
proceed. 


2.2 

Classification of data 

Consider the data in Table 2.1. These are the intelligence quotients of 
100 children obtained from a psychological test. As a first step in the 
direction of classification we may rank order the 100 intelligence quotients 
in order of magnitude, proceeding from the largest to the smallest as shown 
in Table 2.2. An arrangement of this kind is called a rank distribution. 
Such an arrangement of data has few advantages. Inspection of the rank 
data, however, shows that many scores occur more than once; thus there 
are five 103’s, three 100’s, and so on. This suggests that the data might 
be arranged in columns, as shown in Table 2.3, one coliynn listing the 


Table 2.2 

Rank distribution of intelligence quotients shown in Table 2.1 


134 

109 

102 

93 

82 

127 

109 

101 

92 

82 

122 

108 

101 

92 

82 

121 

108 

100 

91 

82 

121 

108 

100 

91 

81 

119 

107 

100 

91 

81 

119 

107 

99 

90 

81 

117 

106 

99 

90 

80 

117 

105 

99 

90 

80 

117 

105 

98 

90 

80 

116 

104 

96 

89 

79 

114 

104 

96 

89 

79 

113 

104 

95 

89 

78 

111 

104 

94 f 

87 

77 

111 

103 

93 

86 

76 

111 

103 

93 

84 

75 

110 

103 

93 

84 

73 

110 

103 

93 

84 

72 

109 

103 

93 

83 

72 

109 

102 

93 

83 

67 



sec. 2.2 


Classification of data 


27 


Table 2.3 

Frequency distribution of intelligence quotients of Table 2.1 with as 
many classes as score values 

Score / Score / 


134 1 

133 — 

132 
131 
130 

129 

128 

127 1 

126 
125 

124 

123 

122 l 

121 2 

120 

119 2 

118 


117 3 

116 1 

115 

114 1 

113 1 

112 - 

111 3 

110 * 2 

109 4 

108 3 

107 2 

106 1 

105 2 

104 4 

103 5 

102 2 

101 2 


Score / 

100 3 

99 3 

98 1 

97 

96 2 

95 1 

94 1 

93 7 

92 2 

91 3 

‘90 4 

89 3 

88 

87 1 

86 1 

85 

84 3 


Score / 

83 
82 
81 
80 
79 

78 1 

77 1 

76 1 

75 1 

74 

73 1 

72 2 

71 - 

70 

69 - 

68 - 

67 1 


Table 2.4 

Frequency distribution of the intelligence quotients of Table 2.1 


Hass 

interval 



Tally 

Frequency 

130-134 

1 




1 

125 129 

/ 




1 

120-124 

m 




3 

115-119 

m 

1 



6 

110-114 

Ml 

1, 



7 

105-109 

Ml 

V 

11 


12 

100-104 

Ml 

MI 

Ml 

1 

16 

95-99 

Ml 

// 



7 

90-94 

Ml 

Ml 

Ml 

II 

17 

85-89 

Ml 




5 

80-84 

Ml 

Ml 

Ml 


15 

75-79 

Ml 

1 



6 

70-74 

HI 




3 

65-69 

I 




1 

Total 





100 





28 


Frequency distributions and their graphic representation chap. 2 


possible scores and the other listing the number of times each score occurs. 
Such an arrangement of data is a frequency distribution, and the number of 
times a particular score value occurs is a frequency , represented by the 
symbol /. 

I 11 Table 2.3 the data have been classified in as many classes as there 
are score values within the total range of the variable. The number of 
classes is large. Usually it is advisable to reduce the number of classes 
by arranging the data in arbitrarily defined groupings of the variable; thus 
all scores within the range 65 to 69, that is, all scores with the values 65, 
66, 67, 68, and 69, may be grouped together. All scores within the ranges 
70 to 74, 75 to 79, and so on, may be similarly grouped. Such groupings 
of data are usually done by entering a tally mark for each score opposite 
the range of the variable within which it falta and counting these tally 
marks to obtain the number of cases within the range. Th:s procedure 
is shown in Table 2.4. 

The range of the variable adopted is called the class interval. In the 
illustration in Table 2.4 the class interval is 5. This arrangement of data 
is also a frequency distribution, and the number of cases falling within 
each class interval is a frequency. The only difference between Tables 2.3 
and 2.4 is in the class interval, which is 1 in the former c^e and 5 in the 
latter. 


2-3 

Conventions regarding class intervals 

In the arrangement of data witli a class interval of 1, as shown in Table 
2.3, the original observations are retained and may be reconstructed 
directly from the frequency distribution without loss of information. If 
the class interval is greater than 1, say, 2, 5, or 10. some loss of information 
regarding individual observations is incurred; that is, the original observa- 
tions cannot be reproduced exactly from the frequency distribution. If 
the class interval is large in relation to the total range of the set of observa- 
tions, this loss of information may be appreciable. If the class interval 
is small, the classification of data in the form of a frequency distribution 
may lead to very little gain in convenience over the utilization of the 
original observations. 

The rules listed below are widely used in the selection of class inter- 
vals. These rules lead in most cases to a convenient handling of the data. 

1 Select a class interval of such a size that between 10 
and 20 such intervals will cover the total range of the 
observations. For example, if the smallest observa- 



sec, 2.4 


Exact limits of the class interval 


29 


tion in a set were 7 and the largest 1 j 6, a class interval 
of 10 would be appropriate and would result in an 
arrangement of the data into 10 intervals If the 
smallest observation wen 4 2 and the largest 38, a class 
interval of 3 would result in an arrangement of 14 
intervals If the observations ranged from 9 to 20, 
a class niton al of 1 would be convenient 

2 Select class intervals with a range of 1, 2 J, 5 10, 01 
20 points These will meet the requirements of most 
sets of data 

3 Start the (lass interval at a value which is a multiple 

ol the si/e of that* intei val For example, with a class 
mterval of r >, the intervals should start with the values 
") 10, i:> 20, etc V\ ith a clas^ mterval of 2, the inter- 
Mils should start wjth the val 11 2 4, h, 8, 10, etc 

I he 1^, of course, highly arl itrar\ 

4 \i range the '!us>-, intervals according to the orde T of 
magnitude of the observations th*v include, the 4 class 
interval containing the largest observations being 
placed at the top 


2.4 

Exact limits of the class interval 

Where the variable undei consideration is continuous, and not discrete, 
we soled a unit ot measurement and roc ord our obsc ivations as discrete 
value s W he 11 wo record au obsc 1 v at ion m de< rote form and the variable 
is a continuous one wc 1111 >lv that the v xlue recorded represents a valu* 1 
falling within » c^rtam Inn Is These limits arc 4 usuallv taken as ont-half a 
unit above and below the value reported Thus when we report a mea^- 
uiement to the* nearest inch ^ay, U> 111 , wc mean that if a moreac mate 
form of measurement had been used thp \alue obtained would tall within 
the limits ir> 5 and lb *> 111 'similarly, a measurement made to the nearest 
tenth pait of an inch, sav, 7 in , is understood to fall within the limits 
31 65 and 31 7 r > in T11 a reaction time experiment a particular observa- 
tion measured to the neaiest thousandth of a second might be, -.ay, 196 
sec This assumes that had a more accurate timing dev ic e been used, the 
measurement would have been found to tall somewhere within the limits 
1955 and 1965 see 

Class intervals are usually recorded to the nearest unit and thereby 
reflect the accuracy of measurement For various reasons it is frequently 



30 


Frequency distributions and their graphic representation chap. 2 


necessary to think in terms of so-called exact limits of the class interval 
These are sometimes spoken of as class boundaries , or end values , and 
sometimes as real limits. Consider the class interval 95 to 99 in Table 
2.4. We grouped within this interval all measurements taking the values 
95, 96, 97, 98, and 99. The limits of the lower value are 94.5 and 95.5, 
while those of the upper value are 98.5 and 99.5. The total range, or 
exact limits, which the interval is presumed to cover is then clearly 94.5 
and 99.5, which means all values greater than or equal to 94.5 and le^s 
than 99.5. 

The above discussion is applicable to continuous variables only. 
With discrete variables no distinction need be made between the 
class interval and the exact limits of the interval, the two being 
identical. 

Table 2.5 shows the frequency distribution of the intelligence quo- 
tients of Table 2.1. Column 1 shows the class interval as usually written, 
while col 2 records the exact limits. In practice, of course, the exact 
limits are rarely recorded as in Table 2.5 


Table 2.5 

Class intervals, exact limits, and mid-points for frequency 
distribution of intelligence quotients 


I 

2 

3 

4 

Class 

interval 

Exact limits 

Micl-pomt of 
mtei vii 1 

Frequency 

130 134 

129 5-134 5 

132 0 

1 

125-129 

124 5 129 5 

127 0 

1 

120-124 

119 5-124 5 

122 0 

3 

115 119 

114 5 119 5 

117 0 

f« 

110 114 

109 5-114 5 

112 0 

7 

105-109 

104 5-109 5 

107 0 

12 

100-104 

99 5 104 5 

102 0 

16 

95 99 

94 5-99 5 

97 0 

7 

90-94 

89 5-94 5 

92 0 

17 

85-89 

84 5-89 5 

87 0 

5 

80-84 

79 5-84 5 

82 (1 

15 

75-79 

74 5-79 5 

77 0 

6 

70-74 

69 5-74 5 

72 0 


65-69 

04 5 69 5 

67 0 

1 


100 


Total 



sec. 2.5 


Distribution of observations within the class interval 


31 


2-5 

Distribution of observations within the 
class interval 

The grouping of data in class intervals results in a loss of information 
regarding the individual obseivations themselves. Scores may differ one 
from another within a limited range, and yet all be grouped within the 
same interval. In the calculation of certain statistics and in the prepara- 
tion of graphs it becomes necessary to make certain assumptions regarding 
the values within the intervals Two separate assumptions may be made, 
depending on the purposes we have in mind. 

The first assumption states that the observations are uniformly dis- 
tributed over the exact limits of the interval. This assumption is made 
in the calculation of such statistics as the median, quartiles, and per- 
centiles and m the drawing of histograms In Table 2 5 it will be observed 
that 16 cases fall within the interval 100 to 104, which has the exact 
limits 99.5 to 104.5. The assumption -;atos that these 20 cases are dis- 
tributed over the interval as follows: 


Interval 

Frequency 

103 5-104 5 

3 2 

102 5 103 5 

3 2 

101 5-102 5 

3 2 

100 5 101 5 

3 2 

99 5-100 5 

3 2 

Total 

” 16 0 


The second widely used assumption states that all the observations 
are concentrated at the mid-point of the interval, that is, that all the 
observations for that interval are the same and equal to the value corre- 
sponding to the mid-point of the interval. The mid-point of any class 
interval is half way between the exact limits of the interval. In the above 
example the mid-point of the interval 99 5 to 104.5 is 102. This second 
assumption is ordinarily made in the calculation of such statistics as 
means, standard deviations, and »n the drawing of frequency polygons. 

The determination of the mid-point of a class interval should present 
no difficulty. The mid-point may be conveniently obtained by adding 
one-half of the range of the class interval to the lower exact limit of that 
interval. Thus with the interval 100 to 104 the lower limit is 99.5 and 
one-half the class interval is 2.5. The mid-point is therefore 99.5 + 2.5, 
or 102. Consider a 10-point class interval written in the form 100 to 109. 
Here the lower limit is 99.5 and one-half the class interval is 5. The 
mid-point is then 99,5 + 5, or 104.5. Table 2.5, col. 3, shows the mid- 
points of the corresponding class intervals. 



32 


Frequency distributions and their graphic representation chap. 2 


2.6 

Cumulative frequency distributions 

Situations occasionally arise where our concern is not with the frequencies 
within the class intervals themselves, hut rather with the number or per- 
centage of values “greater than’’ or “less than” a specified value. Such 
information may be made readily available by the preparation or a 
cumulative frequency distribution. The cumulative frequencies are 
obtained by adding successively, starting from the bottom, the individual 
frequencies. Table 2.0 shows the cumulati re frequencies and cumulative 
percentages for a distribution of intelligence quotients. 


2.7 

Tabular representation 

Statistical data are frequently arranged and presented in the form of 
tables. Such tables should be designed to enable the reader to grasp with 
minimal effort the information which they intend to convey. While very 
considerable variety in the design of statistical tables is possible, a number 


Table 2.6 

Cumulative frequencies and cumulative percentage frequencies 
for distribution of intelligence quotients 


I 

2 

3 

4 

Class 

interval 

(IQ's) 

Frequency 

Cumulative 

frequency 

Cumulative 

percentage 

frequency 

130-134 

1 

100 

100 0 

125 129 

3 

105 

99 1 

120 124 

i 

102 

96 2 

UTr 119 

10 

98 

92 5 

110-114 

«s 

88 

83.0 

105-109 

15 

80 

75 5 

100- 104 

20 

65 

61 3 

95-99 

14 

45 

42 5 

90 94 

11 

31 

29 2 

85 89 

8 

20 

18.9 

80-84 

6 

12 

11 3 

75-79 

5 

6 

5.7 

70-74 

0 

1 

.9 

65-69 

1 

1 

9 

Total 

106 





sec. 2.8 


Graphic representation of frequency distributions 


33 


of general rules should be observed. Kenney (1954) lists six such rules, 
and these are as follows: 1 

1 Every table must be self-explanatory. To accomplish this the 
title should be short, but not at the expense of clearness. 

2 Full explanatory notes, when necessary, should be incorporated 
m the table, either diieefly under the descriptive title and before 
the body of the table, or else directly under the table 

3 The columns and rows should be arranged in logical order to 
facilitate comparisons. 

4 In tabulating long columns of figures, space should he left after 
cxery fi\e or ten rows, lx>ng unbroken columns are e -n fusing, 
especially when one is comparing two numbers m a row but in 
widely separated columns 

5 If the numbers tabulated have more than three significant figures, 
the digits should be grouped in threes Thus, one should write 
4 (>Kf> 732, nol 4085732. 

6 Doubh lines at the top (or at the top and bottom) may enhance 
the c fTeelixeness of a table If the table nicely hlls Ihe width 
of a page, no side lines should be used In such eases the omis- 
sion of the side lines will ha\e the tendency to emphasize the 
other \ertical lines and cause the interior columns to stand out 
better The columns should not be Aidelv separated, and the 
form of a narrow, compact table should have its side lines 

Tables presented as part of a manuscript should be appropriately 
numbered and should be mserted where possible in close proximity to the 
place where they are refei red to m the text, otherwise the reader is put to 
some inconvenience 

The appropriate design of statistical tables can become a matter of 
some complexity. This is particularly the case where it is necessary to 
present data which are cross-classified in a variety of ways. 


2.8 

Graphic representation of frequency distributions 

Graphic representation is often of great help in enabling us to comprehend 
the essential features of frequency distributions and in comparing one 
frequency distribution with another. \ graph is the geometrical image of 
a set of data. It is a mathematical picture. It enables us to think about 

1 Reproduced, with permission, from John F. Kenney and E. S. 
Keeping, Mathematics of statistics, part 1, 3d cd., copyright 1954, 
D. Van Nostrand Company, Inc., Princeton, N.J. 



34 


Frequency distributions and their graphic representation chap. 2 


a problem in visual terms. Graphs are used not only in the practical 
handling of real sets of data, but also as visual models in thinking about 
statistical problems. Many problems can be reduced to visual form, and 
such reduction often facilitates their understanding and solution. Graphs 
have become a part of our everyday activity. Newspapers, popular 
magazines, trade publications, business reports, and scientific periodicals 
use graphic representation extensively. Graphic representation has been 
carefully studied, and much has been written on the subject. For a more 
detailed account than is given here, see Johnson and Jackson (1953, 
Chap. 3). While graphic representation has many ramifications, we shall 
consider here only those aspects of the subject which are useful m visualiz- 
ing the important properties of frequency distributions and the ways in 
which one frequency distribution may differ from another. 


2.9 

Histograms 

A histogram is a graph in which the frequencies are represented by areas 
in the form of bars. Table 2.7 presents measures of auditory reaction 
time for a sample of 188 subjects. 


Table 2.7 

Frequency distnbution of auditory reaction times for a sample of 
188 University of Chicago undergraduates* 


Class 

interval, 

see 

Mid -point 
of interval 

Frequency 

Cumulative 

frequency 

34 - 35 

345 

2 

188 

32- 33 

325 

2 

186 

30- 31 

305 

4 

184 

28- 29 

285 

5 

180 

26- 27 

265 

11 

175 

24- 25 

245 

17 

164 

22- 23 

225 

28 

147 

20- 21 

205 

69 

119 

18- 19 

185 

37 

50 

16- 17 

165 

12 

13 

14- 15 

*r> 

T-* 

1 

1 

Total 


188 




sec. 2.9 


Histograms 


35 



.125 .145 .165 .185 205 .225 .245 265 285 305 325 345 .366 
Reaction time, seconds 

fig. 2.1 Histogram for data of Table 2.7 Auditory reaction 
times for INK students 


Figure 2.1 shows the frequencies plotted in the form of a histogram. 
To prepare such a histogram proceed as follows. Obtain a piece of 
suitably cross-sectioned graph paper. Paper subdivided into tenths of an 
inch with heavy lines 1 m. apart is convenient Draw a horizontal line 
to represent reaction time in seconds and a vertical line to represent 
frequencies. Select au appropriate scale, both for reaction time and fre- 
quencies. In the present case if we allow m. for each class interval 
and in. for each unit of frequency, we obtain a graph roughly b in. long 
and 4 in. tall The scale is arbitrary. The scale suggested in this case, 
however, results in a graph of convenient size. The mid-points of the 
interval are written along the horizontal base line, and the frequen *y scale 
along the vertical. For each class* interval the corresponding frequency 
is plotted and a horizonta 1 line drawn the full length of the interval. To 
complete the graph we may join the ends of these lines to the correspond- 
ing ends of the intervals on the horizontal axis, although practice in this 
regard varies. Both the horizontal and vertical axes must be appropri- 
ately labeled. A concise statement of what the graph is about should 
accompany it. Observe that the width of each bar corresponds to the 
exact limits of the interval. Observe also that in this type of graph the 
frequencies are represented as equally distributed over the whole range of 
the interval. 




36 


Frequency distributions and their graphic representation chap. 2 



Reaction time, seconds 


fig. 2.2 Frequency polygon for data of Table 2.7. Auditory rear 
tion times for 1HS students 


2.IO 

Frequency polygons 

In a histogram wo assume that all the eases within a class interval are 
uniformly distributed over the range of the interval. In a frequency 
polygon we assume that all eases in each interval are concentrated at the 
mid-point of the interval. I 11 this fact resides the essential difference 
between a histogram and a frequency polygon. Instead of drawing a 
horizontal line the full length of the interval, as in the histogram, we make 
a dot above the mid-point of each interval at a height proportional to the 
frequency. It is customary to show an additional interval at each end 
of the horizontal scale and to join those dots to the dots of the adjacent 
interval. A frequency distribution based on the same data as the his- 
togram in Fig. 2.1 is shown in Fig. 2.2. 

Observe that the frequency distribution in Fig. 2.2 is not a smooth 
continuous curve, since the lines joining the various points are straight 
lines. If we subdivide* our intervals into smaller intervals, we shall of 
course obtain irregular frequencies, there being too few members in each 
interval. Consider, however, a circumstance where our intervals become 
smaller and smaller and at the -*ame time the total number of cases 
becomes larger and larger. If we carry this process to the extreme situa- 




sec 2 ii 


Cumulative frequency polygons 


37 



Reaction time seconds 

fig 2 3 Cumulative frequency polygon to data c f J nhU 2 7 Midi 
tor> ic action times ft r 1SS sti dents 


tion when we have an indefinite ly smiil inteival and in indefinite lv Urge 
number of cases, we ariivt at the concept of a continuous tnqucmv 
distribution 


2.11 

Cumulative frequency polygons 

The drawing of a < umulat \t fre queue v pohgem dificis iiom hat of i 
frequency polygon in two uspc ts / t instead of plotting points < oi 
responding to fi equine ies we plot points c cure spondmg *o « initiative fit 
quemics bee mi instead of plotting pom ts above the n id point >t cadi 
interval, weplotoui points above the ton of tin exae* limits o» th lei 
val r Jhis is done because we wisf oui giaph to v >uillv itm^senr the 
number of e ases falling above or he low pait mlai valuer In plotting tht 
cumulative frequency d^ti ition stow.i in Talile 2 7 v^ we*uld plot 
the cumulative frequent \ 188 against the top of tht e\nt impel limit oi 
the interval, that is TT>, the frequency ISb igamst 1 T> ind so on 
1 guie 2d shows the cumulative frequency distribution tor the data 
appearn g m the last eolun i of Table 2 7 

We may convert oui law frequencies to peue ntage - u< li that all the' 
frequencies added togethei add up to 100 instead of to the uumboi of 
cases We may then determine the cumulative pen outage frequencies 



38 


Frequency distributions and their graphic representation chap. 2 


We may then graph those frequencies and obtain thereby a cumulative 
percentage polygon, or ogive. The advantage of this type of diagram is 
that from it we can read off directly the percentage of observations less 
than any specified value. 


2.12 

Some conventions for the construction of graphs 

1 In the graphing of frequency distributions it is cus- 
tomary to let the horizontal axis represent scores and 
the vertical axis frequencies. 

2 The arrangement of the graph should proceed from left 
to right. The low numbers on the horizontal scale 
should be on the left, and the low numbers on the 
vertical scale should be toward the bottom. 

3 The distance along either axis selected to serve as a unit 
is arbitrary and affects the appearance of the graph. 
Some writers suggest, that the units should be selected 
such that the ratio of height to length is roughly 
:i-f) This procedure seems to have some aesthetic 
advantages 

4 Whenever possible the vertical scale should be so 
selected that a zero point falls at the point of inter- 
section of the axes. With some data this procedure 
may give rise t o a most unusual looking graph. In 
such cases it is customary to designate the point of 
intersection as the zero point and make a small break 
in the vertical axis. 

5 Both the horizontal and vertical axes should be appro- 
priately labeled. 

6 Kvcry graph should be assigned a descriptive title 
which states precisely what it is about. 

2.13 

How frequency distributions differ 

Comparison of a number of frequency distributions represented in either 
tabular or graphic form indicates that they differ one from another. An 
important problem in statistics is the identification and definition of prop- 



sec. 2.13 


How frequency distributions differ 


39 


erties or attributes of frequency distributions which describe how they 
differ. It is customary to designate four important properties of fre- 
quency distributions. These are central location, variation, skewness, 
and kurtosis. These properties may be viewed either as descriptive of the 
frequency distribution itself or el* descriptive of the set of observations of 
which the distribution is comprised. These alternatives are in effect 
synonymous. A frequency distribution is a particular kind of arrange- 
ment of a set of observations. Central location, variation, skewness, 
and kurtosis may be discussed either with direct reference to sets of obser- 
vations or with reference to the observations arranged in frequency- 
distribution form. 

Central location refers to a value of the variable near the center of the 
frequency distribution. It is a middle point. Measures of central loca- 
tions are called averages These are discussed in detail in Chap) 3 of this 
book. 

Variation refers to the extent of the clustering about a central value 
If all the observations are close to the central value, their variation will be 
less than if they tend to depart more markedly from the central value. 
Measures of variation are discussed in Chap. 4. 

Skewness refers to the symmetry or asymmetry of the frequency dis- 
tribution. If a distribution is asymmetrical and the larger frequencies 
tend to he concentrated toward the low end of tin' variable and the smaller 
frequencies toward the high end, it is said to be positively skewed. If the 
opposite holds, the larger frequencies being concentrated toward the high 
end of the variable and tin* smaller frequencies toward the low end, the 
distribution is said to be negatively skewed. 

Kurtosis refers to the flatness or peakedness of one distribution in 
relation to another. If one distribution is more peaked than another, it 
may be spoken of as more leptokurtic. If it is less peaked, it is said to be 
more platykurtic. It is conventional to speak of a distribution as lepto- 
kurtic if it is more peaked than a particular type of distribution known as 
the normal distribution, and platykurtic if it is less peaked. Th»' normal 
distribution is spoken of as mesokurtic , which means that it falls between 
leptokurtic and platykurtic distributions. 

Table 2.8 presents hypothetical data illustrating frequency distribu- 
tions with different properties. The distribution in col. 2 is a symmetrical 
binomial , a type of distribution which is of much importance m statistical 
work and will be considered in detail in a later chapter. T. he distribution 
in col. 3 has central frequencies which are greater than those for the 
binomial. It is more peaked than the binomial, and as far as kurtosis is 
concerned, it can be said to be leptokurtic. The distribution in col. 4 has 
smaller central frequencies than the binomial and larger frequencies 



uency distributions of different shapes 





-**• C* I CM 


(N C 1C O* C=> *0 O OO 

^ C^l *-* C* — • I C* 


cz5 cr> I oo 

*— < <M CO 


»o <z> »o »o C^> ^ *O QO 

r-H CM OS CM «— • rs| 


•N M C? IN 


9) 9d 9t> Oi Cl a* 

NtOiOTtiMCa^Oi 

oooooiii 

N « »0 ^ CC N ^ 



sec. 2. 14 


Properties of frequency distributions 


41 



fig. 2.4 Three frequency distributions identical in shape but with differ- 
ent averages. 


toward the extremities. It can be spoken of as plaivkurtic. The dis- 
tribution in col. 5 has uniform frequencies over all class intervals and is 
described as rectangular. The distribution in col. 6 has two humps, or 
modes. It is said to be bimodal. In the distribution in col. 7 the largest 
frequencies occur at, the extremities whereas the central frequencies are 
the smallest. Such a distribution is said to be U-shaped. The distribu- 
tions in cols. 2 to 7 are all symmetrical and have the same measures of 
central location although they differ in variation. Column 8 illustrates a 
positively skewed and col. 9 a negatively skewed distribution. Extreme 
skewness leads to the type of distribution shown in col. 10, which is 
described as J -shaped. 

2.14 

The properties of frequency distributions 
represented graphically 

The differing characteristics of frequency distributions can be readily 
represented in graphical form. Consider the three distributions in Fig. 
2 4'. These distributions appear identical in shape. They are markedly 
different, however, in terms of the central values about which the observa- 
tions in each distribution appear to concentrate; that is, they have differ- 
ent averages although they may be identical in all other respects. Dis- 
tribution A has a lotver average than B and B than C. 

Now consider the distribution 111 Fig. 2.o. Inspection of these three 
distributions suggests that while the observations in each case appear to 
concentrate about the same average, they are nonetheless markedly dif- 
ferent one from another. In the case of distribution A the observations 
appear to be more closely concentrated about the average than in the 



42 


Frequency distributions and their graphic representation chap. 2 



fig. 2.5 'Hirer frequency distributions with the same average but with dif- 
ferent \armtion. 



fig. 2.6 Three frequency distributions differing in skewness 

case of B y and the same applies to B in relation to C Thus these dis- 
tributions differ in variation. The observations in A are less variable 
than the observations in B, and those in B are less variable than those in C 
Examine now the distributions in Fig. 2.G. These three distributions 
have different averages and possibly different measures of variation. 
They differ also in skewness. Distribution B is symmetrical about the 
average; that is, if we were to fold it over about the average, we should 
find that it had the same shape on both sides. A and C are asymmetrical, 
the shape to the left of the average being different from the shape to the 
right. Distribution A is positively skewed, the longer tail extending 


Exercises for chapter 2 


43 



fig 2 7 Thru frequency distributions littering in kurtosis 


toward tlu high emi of the scab Hist nhut 1011 ( is negttiveh skfwed 
the longer tail extc tiding tow aid the low end of the m ik\ 

(Consider now tlu graphic \1 rcpiesentatmn oi kurtosis as shown in 
1 ig 2 7 Distribution 1 is a symmetrical be 11-shapcd distribution known 
as the normal distribution Distribution H is obsoived to be hatter 
on top than the noitnal distribution and is referred to as platykurtic 
while distribution (' is more peaked than the normal and is spoken of as 
k ptokmtu 

I 11 the above disc ussion the lueming which attaches +0 the descriptive 
piof eities of (olleetions of measuiemerits ai tanged m frequent v distribu- 
tions is largel\ intuitive and is dem ed from the inspection of distributions 
111 tabular 01 graphic fonn To proceed with the study of data interpreta- 
tion we iecjuire precisely' d fined numeric ai measures of central location 
\ariation, skewness, and kurtosis Chapteis A and I to follow aie con- 
cerned with the mure precise and foimal delineation of these properties 
their numeiical desmption, and calculation 


EXERCISES 

1 The following are marks obtained b\ a gioup of 40 university students 
ou an English examination 


42 

88 

37 

75 

98 

93 

73 

62 

96 

80 

52 

76 

66 

54 

73 

69 

83 

62 

53 

79 

69 

56 

81 

75 

52 

65 

49 

80 

67 

59 

88 

80 

44 

71 

72 

87 

91 

82 

89 

79 



44 


Frequency distributions and their graphic representation chap. 2 


Prepare a frequency distribution and a cumulative frequency distribu- 
tion for these data using a class interval of 5. 

2 Write down the exact limits and the mid-points of the class intervals for 
the frequency distribution obtained by answering Exercise 1 above. 

3 Write down an acceptable set of class intervals, the exact limits of the 
intervals, and their mid-points, in preparing frequency distributions 
for the following data: (a) error scores ranging from 24 to 87 made by 
a sample of rats in running a maze; ( b ) intelligence quotients ranging 
from 96 to 137 for a group of school children; (c) scholastic-aptitude 
test scores ranging from 227 to 896 obtained by a group of university 
students; (d) response latencies ranging from .276 to .802 sec for a 
group of experimental subjects; (e) supervisors’ ratings from 0 to 9 for 
a group of industrial employees. 

4 Prepare a frequency distribution for the following test scores: 


2 

11 

6 

4 

18 

1 

9 

2 

2 

15 

8 

16 

12 

11 

17 

3 

3 

S 

3 

7 

11 

9 

5 

16 

16 

16 

4 

9 

5 

7 

4 

10 

4 

4 

15 

17. 

0 

5 

u 

18 

5 

10 

9 

8 

7 

7 

*> 

6 

13 

1 


Obtain a cumulative percentage frequency distubution for these data. 

5 Prepare histograms for the data in Exercises 1 and 4 above. 

6 Prepare frequency polygons for the data in Exercises 1 and 4 above. 

7 Prepare cumulative frequency polygons for the data in Exercises 1 and 
4 above. 

8 Frequency distributions of intelligence quotients arc available for (o) 
a random sample of the population at large and (b) a random sample of 
university students. In what ways and for what reasons might you 
expect these two distributions to differ? 

9 In what ways might you possibly expect the frequency distribution of 
marks on a history examination to differ from that for a mathematics 
examination? 



Averages 


Q 

• J .1 Introduction 

Measure^ of central location used in the description of frequency dis- 
tributions are called averages. The word “average” is frequently used to 
refer to a value obtained by adding together a set of measurements and 
then dividing by the number of measurements in the set. This is one 
type of average only and is called the arithmetic mean In general, an 
average is a central reference value which is usually close to the point of 
gieatest concentration of the measurements and may in some sense be 
thought to typify the whole set. For certain purposes a particular meas- 
urement may he viewed as a eertain distance above or below the average. 
Averages in common use in psychology and education are the arithmetic 
mean and the median. Other measures of central location include the 
mode, the geometric mean, and the harmonic mean. This chapter con- 
tains a discussion of the arithmetic mean, the median, and the mode. 
Discussion of the geometric mean and the harmonic mean has been 
omitted because of the infrequency of their use in praethc. By far the 
most important and widely used measure of central location i< the arith- 
metic mean. This statistic is the appropriate measure of central location 
for interval and ratio variables. The median and mode are sometimes 
viewed as appropriate measures for ordinal and nominal variables, respec- 
tively, although they can also lie used with interval and ratio v tnables. 


3-2 

The arithmetic mean 

By definition the arithmetic mean is thr* sum of a set of measurements 
divided by the number of measurements in the set. Consider the 
following measurements: 7, 13, 22, 9, 11, 4. The sum of these ssx 
measurements is 66. The arithmetic mean is therefore 66 divided by 
6, or 11. 

In general, if N measurements are represented by the symbols X h X 2 , 



46 


Averages 


chap. 3 


X 3 , . . . , Xy, the arithmetic mean in algebraic language is as> follows: 

JV 

v x, 

v X, +■ X, + Xi + • • • + X* 

A = - = ~ N 3.i 

The symbol A", spoken of a" X bar, i" used to denote the anthnietic mean 
v 

of the valuer of A\ £ , the (Ireek letter sigma, describes the operation of 

t — i 

summing the *V measurements The summation extend" from i = 1 to 
i = N. 

3-3 

Calculating the mean from frequency distributions 

Consider a situation where different value" of A' occur more* than once. 
The arithmetic mean is then obtained by multiplying *\*u h \altie of A" by 
the frequency of its occurrence, adding togethei t lu^o ]>iodu< a ts. and then 
dividing by the total number of measurement'- ( nuclei the following 
measurements 11, 1 J, 12, 12, 12. Ill, 13. 13, 13, 1.5, 11, 11, l.'», H \\ 16, 
16, 17, 17, 18. The value 11 occurs with a iroquem y of 2. 12 with a fre- 
quency of 3, 13 with a frequency ol f>, and so on The"** dal a may be 
written as follows: 


A\ 

/. 

/.A. 

IK 

1 

IS 

17 

2 

34 

16 

2 

32 

15 

3 

45 

14 

2 

28 

13 

5 

05 

12 

3 

36 

11 

2 

22 

Total 

20 

28(T 


This is a frequency distribution with a class interval of 1. The sym- 
bol/, is used to denote the frequency of oeeurrenee of the particular value 
X % . Multiplying each value X x by the frequency of its occurrence and 
adding together the products /, A\, we obtain the sum 280. The arith- 
metic mean is then 280 divided by 20, or 14.0. 

In general, where X\> X if X^ . . , Xk occur with frequencies 

fit /*, /a, . • . , fk f where k is the number of different values of X, the 



sec. 3.3 


Calculating the mean from frequency distributions 


47 


arithmetic mean 

V fX 

V = fjXi + f*X 2 + UX Z + + f k x k * 

A N ■ “ "AT 

3.2 

Observe that hero the Elimination is over k terms, the number of different 

N k 

values of the variable .Y Observe also that £ X t = £ ^.Y, The 

» - 1 1-1 

above discussion suggests a simple method for calculating the mean from 
data giouped m the foim of a frequency distubution regardless of the 
size of the class interval. The imd»pomt ol the interval may be used to 
represent all values falling within the inter val We assume that the varia- 
ble X takes values conesponding to the mid-points of the intervals, and 
these aie weighted by the frequencies We multiply the mid-points of 
the intervals by the frequencies sum these products, and divide this sum 
by N to obtain the mean More explicitly, the steps involved are as fol- 
lows First, calculate the mid-pomts ot all intervals Strond, multiply 
each mid-point by the conesponding frequency Third, sum the prod- 
ucts of mid-points bv frequencies Fourth, divide this Mini by A to 
obtain the mean To lllie-tiate, consider Table d 1 

Table 3 1 

Calculating the mean for distribution of test scores long method 


1234 


Class 

interval 

Mid-pomt 

-Y, 

1 1 * cjut nc > 

U 

r requency 

X mid-point 

/.M 

45 49 

47 

1 

47 

40 44 

42 

2 

84 

3^-39 

3; 

3 

111 

30- 34 

32 

(> 

192 

25-29 

27 

s 

216 

20-24 

22 

17 

374 

15- 19 

17 

26 

442 

10-14 

12 

11 

J32 

5-9 

7 

2 

14 

0-4 

2 

0 

0 

Total 


76 

“ 1,612 


£ /.X. « 1,612 X = 1,612/76 - 21 21 



48 


Averages 


chap. 3 


The mid-points of the intervals X x appear in col. 2. The frequencies 
f t appear in col. 3. The products of the mid-points by the frequencies 

k 

f x X x are shown in col. 4. The sum of these products £ f % X x is 1,612, N is 

t -= l 

76, and the mean X is obtained by dividing 1,612 by 76 and is 21.21. 

3-4 

Change of origin and unit 

A series of measurements may be conceptualized as points on a line meas- 
ured in appropriate units from an origin or zero point. Thus particular 
measurements, say, 48 or 72 in., may be regarded as points 48 and 72 
units, respectively, from a zero origin, the unit of measurement here being 
the inch. It is frequently useful in statistical work to change the origin 
and to represent the measurements as deviations from a new origin. The 
new origin may be chosen arbitrarily, or it may be the arithmetic mean. 
Consider the measurements 7, 13, 22, 9, 11, and 4. Select an arbitrary 
origin, say, 9. The measurements represented as deviations from this 
origin become —2, 4, 13, 0, 2, and — 5. The measurements represented 
as deviations from the arithmetic mean, in this case H, are —4, 2, 11, 

— 2, 0, and -7. 

Algebraically, a deviation from any arbitrary origin may be repre- 
sented by 

jt[ = X x - X 0 

where x[ is a deviation of the measurement X, from an arbitrary origin X 0 . 
A deviation from the arithmetic mean may be represented by 

x, = X x - X 

where x t is a deviation of the measurement X x from the mean V. The 
symbol x t will be used frequently in this book to refer to a deviation from 
the arithmetic mean. Both the above expressions are simple transforma- 
tions of the measurements involving a change in origin. 

Situations arise where a change in unit is involved. To illustrate, we 
may convert inches to feet by dividing by 12, or ounces to pounds by 
dividing by 16. This is a simple change in the unit of measurement. On 
occasion both a change in unit and a change in origin are required. The 
deviations of the measurements 7, 13, 22, 9, 11, 4 about the arbitrary 
origin 9 are —2, 4, 13, 0, 2, — 5. If these deviations are now divided by 
any number, say, 2, a change in unit results and the deviations become 

— 1, 2, 6.5, 0, 1, and —2.5. In this case the unit of measurement is 
twice as large as it was before. 

A deviation from any arbitrary origin with a change in unit may be 



sec. 3.5 


Alternative method of calculating the mean 


49 


written as 


x, = 


X t -X 9 


where h is the new unit. This expression may be spoken of as a trans- 
formation involving both a change in origin and a change 111 unit. 


35 

An alternative method of calculating the mean 
from frequency distributions 

A change of origin and unit may be used to t educe the aiithmetieal labor 
in calculating the mean from data grouped in trequency-distnbutmn form 
This method is illustrated with reference to Table 3 2 

In eol 2 the frequencies are recoided First, select tin mid-point of 
any class interval as an arbitrary origin , or ashumui nuan The -.election 
of an arbitrary origin near the middle of the distribution simplifies the 


Table 3.2 

Calculating the mean for distribution of test scores short method 


I 2 

(Mass Frequency 
inters al /, 

3 

( ^imputation 
variable 

i 

- r « 

4 

Fi cquenc} b\ 
com put at ion 
\ arinblc 
/«/ 

■5 

\l w 

compulation 
\ arifthU 

6 

inequemy by 
new com pul 1 - 
t ion \ unable 

f A 

45-49 

1 


5 

5 

0 

h 

40 44 

2 


4 

S 

5 

in 

35-39 

3 


3 

9 

4 

12 

30 34 

6 


2 

1“ 

i 

is 

25-29 

8 


1 

S 

2 

16 

20-24 

17 


0 

0 

1 

!7 

15 19 

26 


1 

— 20 

0 

0 

10-14 

11 


— 2 

— 22 

- 1 

n 

5-9 

2 


-3 

0 

*> 

4 

0 4 

0 


- 4 

0 

• > 

0 

Total 

7G 



"-72 


04 


k 







Y f'*' 

- - 12 

X = 22 + 5t 

12/76) 21 



jt \;necK 

£ /.x" =64 JC = 17 + 5(64/76; - 21 21 


50 


Averages 


chap. 3 


arithmetic. In the present example the arbitrary origin is taken as the 
mid-point of the interval 20 to 24, which is 22, and 0 is written opposite 
that interval in col. 3. The mid-point of the interval 25 to 29 is one unit 
of class interval above the arbitrary origin, and 1 is written opposite this 
interval. The mid-point of the next interval, 30 to 34, is two units of 
class interval above the arbitrary origin; hence 2 is written opposite this 
interval. The procedure simply amounts to writing +1, f2, +3, and 
so on, opposite the intervals above the abritrary origin, and — 1 , —2, - 3, 
and so on, opposite the intervals below the arbitrary origin. These 
numbers, which appear in col. 3, are referred to as the computation varia- 
ble and are represented by the symbol jl\. They are the deviations of the 
mid-points of the class intervals from an arbitrary origin in units of class 
interval. Second , multiply the frequencies by the computation variable 
with due regard to sign as shown in col. 4. Third , add col. 4 to obtain 

k 

£ f%x[y the sum of deviations about the arbitrary origin in units of class 

i — i 

interval. In the present example this sum is —12. Fourth , divide this 
sum by N and multiply the result by h } the class interval Here we 
divide —12 by 70 and multiply the result by 5 to obtain —.79. The 
fifth step involves the addition of the quantity thus obtained, — .79, to Ihe 
arbitrary origin 22 to obtain the mean. The mean is then 21.21. 

Let us summarize the steps involved: 

1 Select an arbitrary origin and write down the computa- 
tion variable. 

2 Multiply the frequencies by the computation variable 

3 Sum these products with due regard to sign 

4 Divide this sum by N and multiply by h } the elasi* 
interval. 

5 Add the result to the arbitrary origin to obtain the 
arithmetic mean. 

This procedure is ordinarily accomplished by the application of a 
simple formula: 

i u 

X = X 0 + h 3 ' 3 

where Xo - arbitrary origin 
f % = frequencies 
x[ =■ computation variable 
N -= number of cases 
h = class interval 



sec. 3.6 


The mean of combined groups 


51 


In the present example 

A = 22 -f 3 X ~ 12 - 21 21 
/(> 

To check the calc illation, select a, new arbitrary ongin as shown in 
col 5 of Table ,1 2 and tepeat the calculation Note that the difference 

k k 

between V /,j' and £ /, i" in Table 1 2 is equal to N Plus provides a 

t — 1 1—1 

check on the calculation thus far and will hold where the two arbitrary 
origins are one class mter\al leuicwed fiom each other 


3-6 

The mean of combined groups 

( onsulei a gioup of Hi measurements with me n and a set of n» 
measurements with mean A 2 Denote the /th u casurtintnt m the first 
gioup by the ymbol \ ,1 and the ith measurement in the second group hv 
tht symbol A tJ The first subscript identifies the pai t u uldi measure- 
ment 1 he second subscript identities the gioup Thus \ lL would m 
tins notation identity the seventh me iMiiemcnt in the second gioup oi 
measurements Let n \ |- n > — V the total number ot measuie n cuts m 
the two groups I he mean of all the measurements in the two groups 
taken together is 


I *.i+ l v 

-l »-i 

wj + n 2 


ft i \ i 4 n A 
V 


3-4 


To illustrate, the mean ot the tour n easuiemcnN 1, S and S is 
The' mean of the >ix mcasuicments 4, 1 >, b, 8 and 1 *» h 7 The mean ot 
all ten measui aments taken togethoi is then 

v - 4 _ x 2 + 6 _ x 7 - < > 

A 10 - ” - 

The above result may be extended to apply to any number ot groups 
With more than two groups, sa> , k , we sir lply multiply the number of 
cases in each gioup by the gioup mean, sum the A pi od tuts thus obtained, 
and divide by N, the total number of measurements in the ^ groups 
Thus "nth k groups 

h 

n t A\ 


3-5 



Averages 


chap. 3 


52 


3*7 

Some properties of the arithmetic mean 

The sum of the deviations of all the measurements in a set from their arithmetic 
mean is zero . The arithmetic mean of the measurements 7, 13, 22, 9, 11, 
and 4 is 11. The deviations of these measurements from this mean are 

— 4, 2, 11, —2, 0, and —7. The sum of these deviations is zero. 

Proof of this result is as follows: 

A' N V 

I (X - X) = l x - 1 X 

t « 1 i =» 1 % => 1 

= NX NX 

= 0 

N N 

Observe that since X = ( £ X) /N it follows that £ A' -= NX. Also 

* « 1 t»l 

adding X, the mean, N times is the same as multiplying ,Y by N ; thus if 
X is 11 and N is 0, we observe that 

II + 11 + 11 + 11 + 11 + 11 = 6 X 11 = 60 

The sum of squares of deviations about the arithmetic mean is less than 
the sum of the squares of deviations about anq other value The deviations 
of the measurements 7, 13, 22, 9, 11,4 from the mean 11 are —4, 2, 11, 

— 2, 0, —7 The squares of these deviations are 16, 4, 121, 4, 0, 49. The 
sum of squares is 194. llad any other origin been selected, the sum of 
squares of deviations would be greater than the sum of squares about the 
mean. Select a different origin, say, 13. The deviations are —6, 0, 9, 4, 
2, —9. Squaring these we have 36, 0, 81, 16, 4, 81. The sum of these 
squares is 218, which is greater than the sum of squares about the mean. 
Selection of any other origin will demonstrate the same result. 

This property of the mean indicates that it is the centroid, or center 
of gravity, of the set of measurements. Indeed, the mean is the central 
value about which the sum of squares of deviations is a minimum. This 
result may be readily demonstrated. Consider deviations from an origin 
X + c, where 0. A deviation of an observation from this origin is 

X, - (X + c) = (X x - X) - c 3-6 

Squaring and summing over N observations we obtain 

x IX. - (X + c)]* = 2 ( x . - Xy 

+ t c * ~ 2c X (x. - X) 3-7 

•— 1 i*l 



sec. 3.7 


Some properties of the arithmetic mean 


53 


Because the sum of deviations about the mean is zero, the third term to 
the right is zero. Also c 2 summed N times is Nc 2 , and we write 

N AT 

£ I*. - (X + e)J* - £ (X, - xy + Nc 1 3.8 


This expression states that the sum of squares of deviations about an 
origin A* + c may be viewed as comprised of two parts, the sum of squares 
of deviations about the mean A' and Nc 2 . The quantity Nr 2 is always 
positive. lienee the sum of squares of deviations about an origin X + c 
will always be greater than the sum of squares about A . Thus the sum of 
squares of deviations about the arithmetic 1 mean is less than the sum of 
squares of deviations about any other value. 

Any mean calculated on a sample of size N i-. an estimate of a popula- 
tion mean, which is the value obtained where it is possible to measure all 
members of the population. The mean ha^ the pioperty that for most 
distributions it is a better, or more accurate, or more* efficient, estimate of 
t lie population mean than other measures of cential location such as the 
median and the mode. Thi^ is one reason why it is most frequently used 
Proof of this result is beyond the M*ope of this book. 

Reference lias been made to a number of properties of the arithmetic 
mean. What importance attaches to these properties, or why should 
they he discussed? The fact that the sum of deviations about the mean 
is zero greatly simplifies many forms of algebraic manipulation. Any 
term involving the sum of deviations about the mean will vanish. The 
fact that the sum of squares of deviations about the mean is a minimum in 
effect implies an alternative definition of the mean; namely, the mean is 
that measure of central location about which the sum of the squares is a mini- 
mum. In effect, the mean is a measuie of central location m the least- 
squares sense. The method of least squares is of com iderable importance 
in statistics and is used, for example, in the fitting of lines and runes. 
The mean may be regarded as a p >mt located by the method of least 
squares. The properties pertaining to change' of origin and change of unit 
are of importance in that thev lead to simplified methods of computing 
the mean where a fairly large number of observations is involved. The 
fact that the sample mean provides a better estimate of a population 
paiameter than other measures of central location is of primary impor- 
tance. Throughout statistics we are concerned with the problem of 
making statements about population values from our knowledge of 
sample values. Obviously, the more accurate these statements are, 
the better. 



54 


Averages 


chap. 3 


3-8 

The median 

Another commonly used measure of central location is the median. The 
median is a point on a scale such that half the observations fall above it 
and half below it. The observations 2, 7, 16, 19, 20, 25, and 27 are 
arranged in order of magnitude. Here N is an odd number and the median 
is 19; three observations fall above it and three below it. If another 
observation, say, 31, is included, the median is then taken as the arith- 
metic mean of the two middle values 19 and 20; that is, the median is 
(19 + 20)/2, or 19.5. Consider a situation where certain values of the 
variable occur more than once, as , j for instance, with the observations 
7, 7, 7, 8, 8, 8, 9, 9, 10, 10. The three 8\s are assumed to occupy the 
interval 7.5 to 8.5. The median is obtained by interpolation. In this 
instance we must interpolate two-thirds of the way into the interval to 
obtain a point above and below which half the observations fall. The 
median is then taken as 7.5 + 0.66 = 8.16. 

With a frequency distribution represented in graphical form, the 
ordinate at the median divides the total area under the curve into two 
equal parts. 


39 

Calculating the median from frequency distributions 

In calculating the median from data grouped in the form of a frequency 
distribution the problem is to determine a value of the variable such that 
one-half the observations fall above this value and the other half below. 
The method will be illustrated with reference to the data in Table 3.3. 

First, record the cumulative frequencies as show T n in col. 3. Second, 
detennine N/2, one-half the number of cases, in this example 38. Third , 
find the class interval in whi'di the 38th case, the middle case, falls. The 
38th case falls within the interval 15 to 19, and the exact limits of this 
interval are 14.5 and 19.5. Clearly, the 38th case falls very close to the 
top of this interval because we know from an examination of our cumula- 
tive frequencies that 39 cases fall below the top of this interval, that is, 
below 19.5. Fourth , interpolate between the exact limits of the interval to 
find a value above and below which 38 cases fall. To interpolate, observe 
that 26 cases fall within the limits 14.5 and 19.5, and we assume that these 
26 cases are uniformly distributed in rectangular fashion between these 
exact limits. Now to arrive at the 38th, or middle, case, we require 25 of 
the 26 cases within this interval, because 2 + 11 + 25 = 38. This 
means that we must find a point between 14.5 and 19.5 such that 25 cases 



sec. 3.9 


Calculating the median from frequency distributions 


55 


fall below and 1 case above this point. The proportion of the interval we 
require is ff, which is §£ X 5 units of score, or 4.81. We add this to the 
low r er limit of the interval to obtain the median, which is 14.50 + 4.81, 
or 19 31. 

Let us summarize the steps involved : 

1 Compute the cumulative frequencies. 

2 Determine N / 2 , one-half the number of cases. 

3 Find the class interval in which the middle case falls, 
and determine the exact limits of this interval. 

4 Interpolate to find a value on the scale above and 
below which one-half the total number of cases falls. 
This is the medijyi. 


For the student who has difficulty in following the above a simple 
formula may be employed. 

Median = L + N/ -—h 39 

Jm 


where L 
F 
fm 
N 
h 


exact lower limit of interval containing the median 
sum of all frequencies below L 
frequency of interval containing median 
number of cases 
class interval 


Table 3.3 

Frequency distribution of psychological test scores 


I 

2 

3 

Class 

Frequency 

Cumulative 

interval 

frequency 

46-49 

3 

76 

40-44 

2 

75 

35-39 

3 

73 

30-34 

fi 

70 

25-29 

8 

64 

20-24 

17 

56 

15-19 

26 

39 

10-14 

11 

13 

5-9 

2 

2 

0-4 

0 

0 


Total 


76 



56 


Averages 


chap. 3 


In the present example L =- 14 5, F - 13, f m = 26 , /V — 76, and h = 5. 
We then have 


Median - 14 5 + 


V - 13 
26 


X 5 = 19.31 


3-io 

The mode 

Another measure of central location ^ the mode In situations where 
different \ alues oi A' occur more than once the mode is the most fre- 
quently oc< Hiring value Consider the observations 11, 11, 12, 12, 12, 13, 
13, 13, 13, 13, 14, 14, 14, 15, 15, 15,, 16, 16, 17, 17, 18 Here the value 
13 occurs 5 times, more frequently than any other value, hence the mode 
is 13 

In situations where all values of X occur with equal frequency, where 
that frequency mav lx* equal to or greater thun 1, no modal value can be 
calculated Thus for the set of observations 2, 7, 16, 19, 20, 21, and 27 no 
mode can be obtained Similarly, the observations 2, 2, 2. 7, 7, 7, 16, 16, 
16, 19, 19, 19, 20, 20, 20, 25, 25, 25, 27, 27, 27 do not permit the calculation 
of a modal value \ II values occur with a frequency ?! 3 

Iii the <ase where two adjact nt values of A occur with the same fre- 
quency, which n larger than the 1 frequency of occurrence of other values of 
A r , the mode may be taken rather ai bit rani \ as the mean ot the two adja- 
cent values of A Consider the observation 1 - 11, 11, 12, 12, 12, 13, 13, 13, 
13, 14, 14, 1 1, 11, 15, 15, 10, 16, 17, 18 Here the \alues 13 and 14 both 
occur with a frequency of 4, which is greater than the frequency of occur- 
rence of the remaining values The mode may be taken a- (13 -+ 11) 2, 
or 1 3 5. 

Where tw r o nonadjacent values of A' occur such that the frequencies 
of both are greater than the frequencies m adjacent intervals, then each 
value of A” may be taken as a mode and the set of observations may be 
spoken of as himodal. Consider the observation 11, 11, 12, 12, 12, 13, 
13, 13, 13. 13, 11, 14, 11, 15, 15, 15, 15, 16, 16, 16, 17, 17, 18 Here the 
value 13 occurs five times, and this is greater than the frequency of occur- 
rence of the adjacent values Also 15 occurs four times, and this is also 
greater than the frequency of occurrence of the adjacent values. This 
set of observations may be said to be bimodal 

With data grouped in the form of a frequency distribution the mode is 
taken as the mid-point of the class interval with the largest frequency. 

The mode is a statistic of limited practical value It does not lend 
itself readily to algebraic manipulation. It has little meaning unless the 
number of measurements under consideration is fairly large. 



sec. 3. 1 1 


Comparison of the mean, median, and mode 


57 


3. 11 

Comparison of the mean, median, and mode 

The arithmetic mean may be regarded as an appropriate measure of 
central location for interval and ratio variables. All the particular values 
of the variable are incorporated in its calculation. The median is an 
ordinal statistic. Its calculation is based on the ordinal properties of 
the data. If the observations are arranged ui order, the median is the 
middle value. Its calculation does not incorporate all the particular 
values of the variable, but merely the fact of their occurrence above or 
below the middle value. Thus the sets of numbers 5, 7, 20, 24, 25, and 
10, 15, 20, 52, f>3 have the same* median, namely 20, although their means 
are quite different. The mode, the value or class with the greatest fre- 
quency, is a nominal statistic. Its calculation docs not depend on par- 
ticular values of the variable or their order, but merelv on their frequency 
of occurrence 

A comparison of the mean, median, and mode may be made when all 
three have been calculated for the same frequency distribution. If the 
frequency distribution is represented graphically, the mean is a point on 
the horizontal axis which corresponds to the centroid, or center of gravity, 
of the distribution If a cutout of the distribution is made from heavy 
cardboard and balanced on a knife edge, the point of balance will be the 
mean. The median is a point on the horizontal avis where the ordinate 
divides the total area under the curve into two equal parts. Half the area 
falls to the loft and half to the right of the ordinate at the median. The 
mode is a point on the horizontal axis which corresponds to the highest 
point of the curve. 

If the frequency distribution is symmetrical, the mean, median, and 
mode coincide. If the frequency distribution is skewed, these three 
measures do not coincide. Figure 3.1 shows the mean, median, and mode 
for a positively skewed frequency distribution. We note that the mean 
is greater than the median, which in turn is greater than the mode. If 
the distribution is negatively skewed the reverse relation holds. 

A question may be raised regarding the appropriate choice of a meas- 
ure of central location. In practical situations this question is rarely in 
doubt. The arithmetic mean is usually to be preferred to either the 
median or the mode. It is rigorously defined, easily calculated, and 
readily amenable to algebraic treatment. It provides also a better esti- 
mate of the corresponding population parameter than either the median 
or the mode. 

The median is, however, to be preferred in some situations. Obser- 
vations may occur which appear to be atypical of the remaining observa- 



58 


Averages 


chap. 3 



, 3.1 Relation between the mean, median, and mode in a positively Hkcwed 
frequency distribution. 


tions in the set. Such observations may greatly affect the value of the 
mean. Consider the observations 2, 3, 3, 4, 7, 9, 10, 1 1, 86. Otaervation 
86 is quite atypical of the remaining observations, and its presence greatly 
affects the value of the mean. The mean is 1 5 J a value greater than eight 
of the nine observations. The median is 7. Under circumstances such 
as this it may prove advisable in treating the data to use statistical pro- 
cedures that are based on the ordinal properties of the data in preference 
to procedures that incorporate all the particular values of the variable and 
may be grossly affected by atypical values. The median, an ordinal 
statistic, may under such circumstances be preferred to the mean. In 
the above example the set of observations is grossly asymmetrical. If 
the distribution of the variables shows gross asymmetry, the median may 
be the preferred statistic, because, regardless of the asymmetry of the 
distribution, it can always be interpreted as the middle value. 

For a strictly nominal variable the mode, the most frequently occur- 
ring class or value, is the only “most typical” statistic that can be used. 
It is rarely used with interval, ratio, and ordinal variables where means 
and medians can be calculated. 


EXERCISES 

i In 100 Tolls of a die the frequencies of the six possible events are as 
follows: 



Exercises for chapter 3 


59 


X, 


6 

5 

4 

3 

2 

1 


/. 

17 

14 
20 

15 
15 
19 

N = 100 


Compute the arithmetic means for this distribution. 

2 The following is a frequency distribution of examination marKs: 


('lass 

interval f, 

90-94 1 

85-89 4 

80-84 2 

75-79 8 

70-74 9 

65-09 14 

60-64 0 

55-59 6 

50-54 4 

45-49 3 

40-44 3 


N - 60 


Compute the arithmetic mean. 

3 IIow does the addition of a constant and multiplication by a constant 
affect the arithmetic mean ? 

4 l<'or the following data determine the mean of the combined groups: 

a rii = 12 X 1 = 60 b »i A'i = 10 

n, = 8 A 2 = 40 «2 = /.*> A'» = 50 

5 The sum of squares of deviations of 10 observations from a mean of 50 
is 225. What is the sum of squares of deviations from an arbitrary 
origin of 60 [Eq. (3.8)] ? 



6o 


Averages 


chap, a 


6 Compute medians for the following data: 

a 3, 7, IS, 26, SI 
b 3, 9, 22, 25, 31, 46 
c 6, 25, 31, 31, 45, 64 

7 Compute modes for the following data : 

a 2, 2, 5, 5, 5, 6, 6, 6, 7, 8, 12 
b 3, 3, 4, 4, 4, 5, 7, 7, 9, 12 

8 Compute medians and modes for the data in Exercises 1 and 2 above. 


d 12, 19, 24, 24, 36, 42 
e 4, 4, 5, 5, 6 



Measures of Variation, 

Skewness, 
and Kurtosis 


4 

-^.i Introduction 

Of great concern to the statistician is the variation in the events of nature. 
The variation of one measurement from another b a persisting character- 
istic of any sample of measurements Measurements of intelligence, eye 
color, reaction time, and skm resistance, for example, exhibit variation in 
any .sample of individuals. Anthropometric measurements such as 
height, weight, diameter of the skull, length of the forearm, and angular 
separation of the metatarsals show variation between individuals. Ana- 
tomical and physiological measurements vary; also the measurements 
made by the physicist, chemist, botanist, and agronomist. Statistics 
has been spoken of as the study of variation. Fisher (1948) has observed, 
“The conception of statistics as the study of variation is the natural out- 
come of viewing the subject as the study of populations; for a population 
of individuals in all respects identical is completely described by a descrip- 
tion of any one individual, together with the number in the group. The 
populations which are the object of statistical study always display varia- 
tion in one or more respects.” The experimental scientist is frequently 
concerned with the different circumstances, conditions, or sources which 
contribute to the variation in the measurements he obtains. The analy- 
sis of variance (('hap. 18) developed hy Fisher is an important statistical 
procedure whereby the variation in a set of experimental data can he 
partitioned into components which may be attributed to different causal 
circumstances. 

How may the variation in any set of measurements he described? 
Consider the following measurements for two samples: 

Sample A 10 12 15 18 20 

Sample B 2 8 15 22 28 

We note that the two samples have the same mean, namely, 15. Simple 

inspection indicates, however, that the measurements in sample B are 

more variable than those in sample A ; they differ more one from another 



62 


Measures of variation, skewness, and kurtosis 


chap. 4 


Among the possible measures used to describe this variation are the 
range, the mean deviation, and the standard deviation. The most 
important of these is the standard deviation. 


4*2 

The range 

The range is the simplest measure of variation. In any sample of meas- 
urements the range is taken as the difference between the largest and 
smallest measurements. The range for the measurements 10, 12, 15, 18, 
and 20 is 20 minus 10, or 10. The range for the measurements 2, 8, 15, 
22, and 28 is 28 minus 2, or 20. T,he measurements in the second set 
quite clearly exhibit greater variation than those in the first set, and this 
reflects itself in a much greater range. The range has two disadvantages. 
First, for large samples it is an unstable descriptive measure. Conse- 
quently it should be used with small samples only, preferably 10 or less. 
The sampling variance of the range for small samples is not much greater 
than that of the standard deviation but increases rapidly with increase in 
N. Second, the range is not independent of sample size, except under 
special circumstances. For distributions that taper to jjpro at the 
extremities a better chance exists of obtaining extreme values for large 
than for small samples. Consequently, ranges calculated on samples 
composed of different numbers of cases are not directly comparable. 
Despite these disadvantages the range may be effectively used in the 
application of tests of significance with small samples. For a discussion 
of such tests the reader is referred to Fryer (1954) and Lirulzey (1954, 
Chap. 8). 


4-3 

The mean deviation 

Consider the following measurements ; 

Sample A 8 8 8 8 8 

Sample B 1 4 7 10 13 

Sample C 1 5 20 25 29 

Intuitively, the measurements in sample A are leas variable than those in 
B , which in turn are less variable than those in C. Indeed, the measure- 
ments in A exhibit no variation at all. The means of the three samples 
are 8, 7, and 16. If we express the measurements as deviations from their 
sample means, we obtain 

Sample A 0 0 0 0 0 

Sample £ —6 —3 0 +3 +6 

Sample C -15 -11 +4 +9 +13 



sec. 4.4 


The sample variance and standard deviation 


63 


Inspection of these numbers suggests that as variation increases, the 
departure of the observations from their sample mean increases. We 
may use this characteristic to define a measure of variation. One such 
measure is the mean deviation. The mean deviation is the arithmetic 
mean of the absolute deviations from the arithmetic mean. An absolute 
deviation is a deviation without regard to algebraic sign. To obtain the 
mean deviation we simply calculate the deviations from the arithmetic 
mean, sum these, disregarding algebraic sign, and divide by N. For 
sample A above, the mean deviation is zero. For sample B the mean 
deviation is (6 + 3 + 0 + 3 + 6)/5 = V 8 = 3.6. For sample C the 
mean deviation is (15 + 11 + 4 + 9 + 13)/5 — ¥ = 10.4. 

The mean deviation is given ih algebraic language by the formula 


MD = ~ N ^ 


4-i 


Here A" — A is a deviation from the mean and |A T — X\ is a deviation 
without regard to algebraic sign. The bars mean that signs are ignored. 

Hitherto, symbols above and below the summation sign 1' have been 
used to indicate the limits of the summation. In the above formula for 
the mean deviation these symbols have been omitted, the summation 
being clearly understood to extend over the N members in the sample. 
In this and subsequent chapters symbols indicating the limits of summa- 
tion will, for convenience, be omitted where these are understood clearly 
from the context to extend over N sample members. Where any pos- 
sibility of doubt could exist, the symbols above and below the summation 
sign will be inserted. 

The mean deviation is infrequently used. It is not readily amenable 
to algebraic manipulation. This circumstance stems from the use of 
absolute values. I 11 general, in statistical work the use of absolute values 
should be avoided, if at all possible. It is of inU rest to note that the sum 
of absolute deviations about the median is a minimum. Consider the 
numbers 1, 5, 20, 25, 29. The median is 20. The sum of absolute devia- 
tions is 19 + 15 + 0 + 5 + 9 = 48 The corresponding sum of devia- 
tions about any other origin, say, 19, will be greater than 48. The sum of 
absolute deviations about the origin 19 is 18 -f 14 + 1 + 6 + 10 = 49. 
The median could be defined as that value about which the sum of abso- 
lute deviations is a minimum. 


44 

The sample variance and standard deviation 

Some of the deviations about the mean are positive; others are negative. 
The sum of deviations is zero. One method of dealing with the presence 



64 


Measures of variation, skewness, and kurtosis 


chap. 4 


of negative signs is to use the absolute deviations, as in the calculation of 
the mean deviation. An alternate, and in general preferable, procedure 
is to square the deviations. One measure of variation that makes use of 
the squares of deviations from the mean is the variance. To calculate 
the variance we add together the squares of deviations from the mean and 
divide by N — 1. To illustrate, the mean of the measurements 1, 4, 7, 
10, and 13 is 7. The deviations from the mean are —6, —3, 0, +3, and 
+6. The squares of these deviations are 36, 9, 0, 9, and 36. The sum 
of squares is 90. We divide this by iV - 1 — f) — 1 - 4 to obtain 
V = 22.50. This is the variance. 

In algebraic language the sample variance is given by the formula 

r 


, _ 2(X - X)* 
N - 1 


4.2 


Throughout this book the symbol * 2 is used to refer to the sample variance. 
In the above formula X — X is a deviation from the mean. N is the 
number of measurements. The above formula defines the variance. It 
has no derivation, but is obtained by a process of plausible reasoning. 

Many textbooks define the sample variance by dividing the sum of 
squares of deviations about the mean, 2(X — X ) 2 , by N instead of N — 1. 
What is the difference between these two procedures? In ('hap. 1 a dis- 
tinction was made between sample valuer or estimates, and population 
values, or parameters, s 2 , the sample variance, is an estimate of a 2 , 
the population variance. For certain algebraic reasons when we divide 
2(X — X) 2 by X — 1, w r e obtain an unbiased estimate of a 2 ] that is, s 2 , 
thus defined, will show no systematic tendency to be either greater than 
or less than o 2 . If, however, we divide 2(A r — A') 7 by N, and not N — 1, 
the quantity thus obtained is a biased estimate of a 2 , and will show a sys- 
tematic tendency to be less than a 2 . In many situations it is advantage- 
ous to use an unbiased estimate of <r 2 , and for this reason we have defined 
the variance using N — 1 in the denominator. 

While N is the number of measurements or observations, the quan- 
tity A r — 1 in the definition of the variance is the number of deviations 
about the mean that are free to vary. To illustrate, consider the meas- 
urements 7, 8, 15. The mean is 10, and the deviations about the mean 
are —3, —2, +5. The sum of deviations about the mean is zero; that is 
( — 3) + ( — 2) + (5) ~ 0. Because this is so, if any two of the devia- 
tions are known, the third deviation is fixed. It cannot vary. In this 
example, the sum of squares of deviations about the mean is 

9 + 4 + 25 = 38 

Although this sum of squares is obtained by adding together three squared 



sec. 4.4 


The sample variance and standard deviation 


65 


deviations, only two of these squared deviations can exhibit freedom of 
variation. The number of values that are free to vary is called the num- 
ber of degrees of freedom. A quantity of the kind 2(X — X) 2 is said to 
have associated with it N — 1 degrees of freedom, because N — 1 of the 
N squared deviations of which it is composed can vary. Some intuitive 
plausibility attaches to the idea that in the definition of a measure of 
variation we should divide the sum of squares by the number of values 
that can exhibit freedom of variat ion. The concept of degrees of freedom 
is a very useful and general concept in statistics, and is elaborated in more 
detail elsewhere in this book. 

In general throughout this book the sample variance s 2 is defined as a 
sum of squares of deviations about an arithmetic mean divided by 
*V — 1 and not N. In some situations convenience and simplicity dictate 
the use of N and not N - 1 . In some correlational problems it is simply 
more convenient to employ a definition of the variance using N and not 
N — 1. These situations are in all instances clearly indicated in the 
text. 

In the above discussion the definition of the variance evolves initially 
from a consideration of deviations about the arithmetic mean. An 
alternative, and perhaps more elemental, approach is to begin by con- 
sidering the differences between each value and every other value. With 
two measurements only, X\ and we may consider the difference 
between them, A'i — X 2 . With three measurements, Xi, A" 2 , and X*, 
we may consider the differences X\ — X 2 , X\ — X 3 , and X 2 - X 3 . I 11 

general, for N measurements the number of such differences is N(N — 
l)/2. To illustrate, for the measurements 1, 4, 7, 10, and 13 the differ- 
ences between each measurement and eveiy other measurement are —3, 
— 6, —9, -12, —3, —0, - 9, —3, -6, —3. Note that the sign of the 
difference depends on the t rder of the measurements. If we obtain the 
sum of squares of the differences between each measurement and every 
other measurement and divide by the number of such differences, the 
result is closely related to the variance; in fact it is simply twice the vari- 
ance. In our example the sum of squares of differences is 450. We 
divide this by 10 to obtain 45.0, which is seen to be twice the variance, 
22.5, as previously calculated. In general, in algebraic notation it may 
be shown that 


Z(X, - Xj ) 2 
N(N - l)/2 


4*3 


where the summation is understood to extend over N(N — I)/2 differ- 
ences. This result means that the variance is a descriptive index of how 



66 


Measures of variation, skewness, and kurtosis 


chap. 4 


different each value is from every other value; in fact it is an average of 
the squared differences divided by 2. 

The variance is a statistic in squared units. If X — X is a deviation 
in feet, then (X — X) 2 is a deviation in feet squared. For many purposes 
it is desirable to use a measure of variation which is not in squared units, 
but is in units of the original measurements themselves. We obtain this 
result by taking the square root of the variance, which is called the sample 
standard deviation . It is given by the formula 


.9 


1 2JX-X)* 
N - 1 


4-4 


The variance and the standard deviation are the most important and 
most frequently used measures of variation. To fully grasp the meaning 
of these statistics requires some familiarity with their use in practice. 


4-5 

Some illustrative applications 

Our understanding of the nature of the variance and the standard devia- 
tion will be enriched by considering illustrative situations w'here these 
statistics are of interest. Consider a simple experiment designed to 
investigate the effect of a drug on a cognitive task such as coding. An 
experimental group of subjects, who receive the drug, and a control group, 
who do not receive the drug, are used. Each group contains 10 subjects. 
Let us assume the scores on the coding task for the two groups are as 
follows : 

Experimental 5 7 17 31 45 47 68 85 96 99 

Control 29 36 37 42 49 58 62 63 69 70 

The mean score for the experimental group is 50.0, and that for the con- 
trol, 51.5. The investigator might be led to conclude from inspecting 
these means that the drug had little or no effect on the performance of the 
subjects. The standard deviations for the two groups are, respectively, 
33.9 and 14.1, the experimental group being much more variable in per- 
formance than the control group. Quite clearly the treatment is exerting 
a substantial influence on the variation in performance, although its 
influence on level of performance is negligible. In the analysis of experi- 
mental data the investigator must attend to, and if possible interpret, 
differences in the standard deviation, or variance, as well as differences in 
the arithmetic mean. 



sec. 4.6 


Calculating standard deviation from ungrouped data 


67 


4.6 

Calculating the sample variance and the standard 
deviation from ungrouped data 

For purposes of calculation, it is convenient to write the variance and the 
standard deviation in a different form. The variance may he written as 

_ x(x - xy 

* ~ n — i 

_ Z(X 2 + X 2 - 2XX) 

N - 1 

_ 2Y 2 + NX 2 - 2 NX 2 
N — 1* 

2Y 2 :z jo; 2 
a - 1 

In this derivation note that the summation of X 2 over N tcims is simply 
NX 2 ; also the summation of 2XX is 2XEX = 2NX 2 The standard 
deviation is given by 

IzX 2 -~NX 2 
S Y N - 1 


Thus to calculate the standard deviation using this formula, w*e sum the 
squares of the original observations, subtract from this N times the square 
of the arithmetic mean, divide by N — 1, and then take the square root. 
For example, the five observations 1, 4, 7, 10, and 13 have a mean of 7. 
The squares of these observations are 1, 16, 49, 100, and 169. The sum 
of these squared observations is 335. The variance is then 


a* = 


2Y 2 - NX 2 335 - 5 X 7 2 


N - 1 


5 - 1 


= 22 50 


and the standard deviation is \^22J)0 = 4.74. 

An alternative formula for the standard deviation which avoids the 
calculation of the arithmetic mean and may, therefore, be useful for cer- 
tain computational purposes is 


s 



Nix 2 - (fxy 

N(N - i) 


4.5 


This formula requires one operation of division only. 

In computing a standard deviation on a calculating machine, the 
measurements are entered on the machine and the sums and sums of 
squares of measurements are obtained in a single operation. This yields 
the information required to calculate the standard deviation. In most 
cases it is advisable to repeat the operation as a check. 



68 


Measures of variation, skewness, and kurt 06 is 


chap 4 


4-7 

Calculating the standard deviation from a 
frequency distribution 

The formula used in tabulating tin standaid deviation s from data 
grouped in the foim of a frequent y distribution is 



where h class interval 
/ •= frequencies 
/' computation variable 


Application of this formula is illustrated with refc rente to Table 1 1 
Fust, select an arbitrary origin neai the middle of the distribution and 
write down the compulation vanabh r' Sfamri multi]>l\ tlu fic 
quern ies by the computation variable 1 with due icgaid to sign to obtain 


Table 4 1 

Calculating the standard deviation for frequency distribution of 
test scores 


I 

2 

3 

4 

5 

CllSS 
intc rval 

I n qut nc \ 
f 

( (imputation 
v nimble 

■T 

I rt tjuc nr \ b\ 

( uin put itlOI 
\ inablr 

f* 

I rc (jut m \ bv 
square ol 
computation 
variable 

u 


45 40 

1 

5 


> 

25 

40 44 

2 

4 


s 

J 2 

J 5- JO 

l 

! 


0 

27 

JO- 14 

b 

2 


12 

24 

25 20 

S 

1 


S 

S 

20 24 

17 

0 


0 

0 

15 10 

2 b 

-1 


— 26 

2 b 

10 14 

11 

- 2 


-22 

44 

5 0 

2 

- J 


-6 

lb 

0 4 

0 

-4 


0 

0 

otal 

76 



-12 

204 


h = 5 Xfj' « - 12 ___ 

A - 76 Z/x 1 - 204 « - 5 \/Vy(204 - ”821 




sec. 4.8 


Effects of grouping 


69 


the products fx' as shown in col. 4. Third , multiply the products fx' 
of col. 4 by the computation variable x * to obtain the values /.r ' 2 in col. ii. 
Fourth , sum cols. 4 and 5 to obtain 2 /.r' and 2 /x' 2 , in this case —12 and 
204, respectively. 2/;r' and £/.r ' 2 are the sum and sum of squares of 
deviations about the arbitrary origin in units of class interval. Fifth, 
substitute the values obtained in formula (4.0) above. This formula 
involves the conversion of a sum of squares of deviations about an arbi- 
trary origin, in units of class interval, to a sum of squares of deviations 
about the actual mean, in original units. Thus 



In summary, the steps are: 

1 Select an arbitrary origin and write down the computa- 
tion variable. 

2 Multiply the frequencies by the computation variable 
to obtain the products fx'. 

3 Multiply these products by the computation variable 
to obtain the products /x'-. 

4 Sum fx/ and fx' 2 to obtain 2/x' and 2 /x' 2 . 

5 Apply formula for calculating s from grouped data. 
To check the result the calculation may be repeated 
using a different arbitrary origin. 


4.8 

Effects of grouping 

In calculating the mean and standard deviation all observations in any 
class interval are assigned a value equal to the mid-point of the interval. 
With distributions that taper off to zero at the extremities, the point of 
concentration of the values within any interval is not the mid-point of the 
interval but is usually a point slightly nearer the mean. I hus the 
mean of the original observations within any class interval will tend to be 
a little bit closer to the mean of the distribution as a whole than the mid- 
point of the interval. 

In computing the mean from grouped data, grouping exerts no sys- 
tematic effect because errors resulting from the assumption that the 
observations are concentrated at the mid-point of each interval tend to 
balance, the errors on one side of the mean being positive and those on 



70 


Measures of variation, skewness, and kurtosis 


chap. 4 


the other negative. Thus a mean calculated from grouped data may be 
expected to differ very little from that calculated from ungrouped data. 

The standard deviation, however, involves the squaring of deviations 
about the mean. In consequence, the errors of grouping on either side of 
the mean do not tend to cancel each other but add together. Thus a 
standard deviation calculated from grouped data will tend to be larger 
than a standard deviation calculated from the original ungrouped 
observations. A correction, known as Sheppard's correction for group- 
ing, may be applied to the standard deviation. The formula for this 
correction is as follows: 



where s r - corrected standard deviation 

s =■ uncorrectcd standard deviation 
h = class interval 

Where the class interval is small, the effects of grouping on the stand- 
ard deviation are not great and the corrected value will differ only slightly 
from the uncorrected value. If the class interval is large, the effects of 
grouping may be substantial. Sheppard’s correction is ap|4icable only to 
continuous variables whose distribution^ arc roughly normal in form It 
is not applicable to rectangular, J -shaped, or U-shaped distributions. 
The correction should be used m all cases wheto an accurate estimate of 
the population standard deviation is required It should not be used in 
the application of ecitam tests of significance, a point to be discussed in 
later chapters 


49 

The effect on the standard deviation of adding or 
multiplying by a constant 

If a constant is added to all the observations in a sample , the standard devia- 
tion remains unchanged. An examiner may conclude, for example, that 
an examination is too difficult. He may decide to add 10 points to all the 
marks assigned. The standard deviation of the original marks will be 
the same as the standard deviation of marks with the 10 points added. 
This result follows directly from the fact that if X is an observation, the 
corresponding observation with the constant c added is X + c. If -S’ is 
the mean of the original observations, the mean with the constant added is 
X + c. A deviation from the mean of the observations with the con- 
stant added is then (X + c) — (X + c), which is readily observed to be 
equal to X — X. Since the deviations about the mean are unchanged by 



sec. 4.10 


Standard deviation of the first N integers 


71 


the addition of a constant, the standard deviation will remain unchanged. 
To illustrate, by adding a constant, say •>, to the measurements 1, 4, 7, 10, 
and 13, we obtain 6, 9, 12, 15, and 18. The mean of the original measure- 
ments is 7, and the mean of the measurements with the constant added is 
7 + 5, or 12. The deviations from the mean are in both instances the 
same, namely, — (5, —3, 0, +3, and +0. The standard deviation in both 
instances is 4.74. 

If all measurements m a sample are multiplied by a constant , the stand- 
ard deviation is also multiplied by the absolute value of that constant. If the 
standard deviation of examination marks is 4 and all marks are multiplied 
by the constant 3, then the standard deviation of the lesulting marks is 
3X4 = 12. To demonstrate this result, we observe that il X is the 
mean of a sample of measurements, the mean of the measurements mul- 
tiplied by c is cX. A deviation from the mean is then 

cA r - cX = c(X - X) 

By squaring, summing over N observations, and dividing by N ~ 1. we 
obtain 

2(cX - cxy r 2 2(X - A r ) 2 

N — 1 ~ A- 1 “ f 

Thus if all measurements are multiplied by a constant c, the \ariance is 
multiplied by c 2 and the standard deviation by the absolute value of c. 
If c is a negative number, say —3, s is multiplied by the absolute value 3. 
By way of illustration, the measurements 1, 4, 7, 10, 13 have a mean of 7, 
a variance of 22.50, and a standard deviation of 4.74. If the measure- 
ments are multiplied by the constant 5, we obtain 5, 20, 35, 50, 65. The 
mean is now 5 X 7, or 35. The deviations from the mean are —30, —15, 
0, +15, +30. Squaring these we obtain 900, 225, 0, 225, 900 The sum 
of squares is 2,250, the variance is 562.50, and the standard deviation is 
23.72, whereas 5 times the original standard deviation of 4.74 is 23.70. 
The slight discrepancy results from the rounding of decimals. 


4.10 

Standard deviation of the first N integers 

We state without proof that the sum of squares of the first N integers is 
N(N + 1)(2AT + 1) 

6 “ 




72 


Measures of variation, skewness, and knrtosis 


chap. 4 


Consider the integers i, 2, 3, 4, 5, 6, 7, 8, 9, 10. Applying the above 
formulas, the sum of squares is 38) and the standard deviation is 3 03 
These result^ may be readily checked by dirett calculation 

Formula (4 9) is obtained directly from the definition of the standard 
deviation as ,s* = VsCY — X) 2 (N — 1; If the standard deviation is 
defined as i> = X^Z(X — A) 2 N } the standard deviation of the first N 
integers is t s — y/ (N 2 — 1 ) / 12 

These formulas are of partn ulai use in relation to problems invoh mg 
ranks (Chap 14) Where ranks are used, the observations aie repie- 
sented by the fust N integers 

4.11 

The variance of combined groups 

On occasion we know the means and variances of two sample 1 * of meas- 
urements and may wish to obtain the \auance of the two samples com- 
bined We may, for example, have means and variant es of examination 
marks for two classes of university students and may wish to find the 
\anance of marks for all the mdiv iduals m the two ( lasses Let the num- 
ber of cases, mean, and variant e for one group be fi Xl \ i, and and toi 
the other n 2 , X 2 , and ,s 2 2 Let A and s 2 be the mean and \anam < for the 
combined gioups Also, let 

tii + v 2 — N 

Si - X = d i 

and 

X 2 - X - d 2 

We state wnthout proof that the variance of the combined group is 

s* = N *— J [(n, - l)si J + (n 2 — 1)*,' 

+ nydi 4 nidi 1 ] 4.10 

This formula ean be extended from two to any number of groups, say, k 
To illustrate, seven meauiremeiits 1, 6, 8, 10, 13, 18, and 21 ha\e 
a mean of 11 and a variance of 48 00, that is, n x - 7, A\ = 11, and 
«i 2 = 48 00 The five measurements 1, 4, 7, 10, 13 have a mean of 7 and 
a variance of 22.50; that is, n 2 — 5, X 2 = 7, and s 2 2 = 22 50 The mean 
of all 12 measurements taken together, the combined group, i 9 33 The 
quantity d, = 11 - 9 33 = 1 67 and d 2 = 7 - 9 33 =- -2.33. The 
variance of the combined groups is 

s 2 = rrl(7 - 1) X 48 + (5 - 1) X 22 50 + 7(1.67) 2 

+ 5(— 2 33) 2 ] = 38.61 



sec. 4.12 


Standard scores 


73 


The standard deviation s = \/38.01 = 0.21. The above result may be 
cheeked by direct calculation. 

4.12 

Standard scores 

A deviation from the mean divided by the standard deviation is called a 
standard score and is represented by the symbol z Thus 

X - X x 

z ~ — 4. 1 1 

.S* K 

Deviation scores X — X, or .r, have a mean of zero and a standard devia- 
tion s. The subtraction of the mean from all measurements in a sample 
does not change the standard deviation. Standard scores have zero mean 
and unit standard deviation. As previously shown, if all measurements 
in a sample are multiplied by a constant, the staudaid deviation is also 
multiplied by that constant. The deviations from the mean A" — X 
have a standard deviation s If all deviations are divided by s } which 
amounts to multiplying by the constant 1 .s, the standard deviation of the 
scores thus obtained is s 's - 1. 

To illustrate, the following observations have been expressed in 
raw-score, deviation-score, and standard-score form. 


Individual 

A 

X 

z 

A 

3 

— i 

-1 11 

R 

6 

- 4 

- .63 

r 

7 

-3 

- 47 

D 

0 

-1 

- 16 

E 

15 

5 

79 

F 

20 

10 

1 5S 

Sun 

60 

00 ’ 

00 

Mean 

10 

.00 

00 

s 

6 32 

6 32 

1 00 


Because standard scores have zero mean and unit standard deviation, 
they are readily amenable to certain forms of algebraic manipulation. 
Many formulations can be derived more conveniently using standard 
scores than using raw or deviation scores. 

The use of standard scores means, in effect, that we are using the 
standard deviation as the unit of measurement. In the above example 
individual A is 1.11 standard deviations, or standard deviation units, 
below the mean, while individual F is 1.58 standard deviation units above 
the mean. 


74 


Measures of variation, skewness, and kurtosis 


chap. 4 


Standard scores are frequently used to obtain comparability of 
observations obtained by different procedure*. Consider examinations 
in English and mathematics applied to the same group of individuals and 
assume the means and standard deviations to be as follows: 



X 

8 

English 

65 

8 

Mathematics 

52 

12 


In effect, in relation to the performance of the individuals in the group, 
a score of 55 on the English examination is the equivalent of a score of 
52 on the mathematics examination. To illustrate, a score one standard 
deviation above the mean, that i*, f>5 + 8. or 73, on the English examina- 
tion can be considered to be the equivalent of a score one standard devia- 
tion above the mean, that is, 52 +- 12, or 04, on the mathematics examina- 
tion. If an individual makes a score of 57 on the English examination 
and a score of 58 on the mathematics examination, we may compare his 
relative performance on the two subjects by comparing his standard 
scores. On English his standard score is (57 — 65)/8 = — 1.0, and on 
mathematics his standard score is (58 — 52)/I2 = 5. Thus on English 
his performance is one standard deviation unit below the average, while 
on mathematics his performance i* .5 standard deviation unit above the 
average. Quite clearly, this individual did much more poorly in English 
than in mathematics relative to the performance of the group of indi- 
viduals taking the examinations, although this is not reflected in the 
original marks assigned To attain rigorous comparability of scores, the 
distributions of scores on the two tests should lie identical in shape. The 
meaning of this statement will become clear as we proceed. 

The reader should note that the sum of squares of standard scores, 
Zz 2 , is equal to JV — 1. Wc observe that z 2 = (X — X) 2 /s 2 ; hence 

*■ . rn ;-_*)* _ __ = at — 1 

« s 2(X - X)*/(N - 1) 

The reader should note here that if « s is defined as 2(X — X)*/N, the 
sum of squares of standard scores is N and not N — 1. 

4.13 

Advantages of the variance and standard 
deviation as measures of variation 

The variance and standard deviation have many advantages over other 
measures of variation. Much statistical work involves their use. The 







sec. 4.14 


Moments about the mean 


75 


variance has certain additive properties, and may on occasion be parti- 
tioned into additive components, each of which may be related to some 
causal circumstance. The sample standard deviation is a more stable or 
accurate estimate of the population parameter than other measures of 
variation. It provides a more stable estimate of the standard deviation 
in the population than the sample mean deviation, for example, does of 
the mean deviation in the population. The variance and standard devia- 
tion are more amenable to mathematical manipulation than other meas- 
ures. They enter into formulas for the computation of many types of 
statistics. They are widely used as measures of error. In later dis- 
cussion on sampling statistics the reader will observe that the standard 
error is in effect the standard deviation of error> made in estimating 
population parameters from sample values. These errors result from the 
operation of chance factors in random sampling. A full appreciation of 
the importance and meaning of the variance and standaid deviation in 
their many ramifications requires considerable familiarity with statistical 
ideas. 


4.14 

Moments about the mean 


The mean and the standard deviation are closely related to a family of 
descriptive statistics known as moments. The first four moments about 
the arithmetic mean are as follows: 


m 1 


w 2 




m 4 


2(X -X) 
N 

z(x - xy 

N~ 


N - 1 

~N~ 


s 


z(x - xy 

N 

x(x - xy 

N 


4.12 


In general, the rth moment about the mean is given b\ 

2(X - XY 
m r = -— Ar 


4.13 


The term “moment” originates in mechanics. Consider a lever sup- 
ported by a fulcrum. If a force f\ is applied to the lever at a distance jq 
from the origin, then fiJCi is called the moment of the force. Further, if 
a second force ft is applied at a distance the total moment is fix\ + 
f&t. If we square the distances .r, we obtain the second moment; if we 



76 


Measures of variation, skewness, and kurtosis 


chap. 4 


cube them, we obtain the third moment ; and so on. When we eome to 
consider frequency distributions, the origin is the analogue of the fulcrum 
and the frequencies in the various class intervals are analogous to forces 
operating at various distances from the origin. Observe that the first 
moment about the mean is zero and the second moment is (N — 1)/AT 
times the .sample variance. The third moment is used to obtain a meas- 
ure of skewness, and the fourth moment, a measure of kurtosis. 

4-15 

Measures of skewness and kurtosis 

A commonly used measure of skewness may be obtained from the second 
and third moments and is defined as 

rn 3 

<J\ - y - 4-14 

v nu 

The rationale of this statistic is based on the observation that when a dis- 
tribution is symmetrical, the sum of cilia's of deviations above the mean 
will balance the sum of cubes of deviations below the uiAd Thus when 
the distribution is symmetrical, /// , = 0 and y\ - 0 . If the distribution 
has a long tail to the right, the sum of cubes of deviations above the mean 
will be greater than the corresponding sum below the mean. Tuder these 
ourcumstanees, the distribution is positively skewed and y 1 is positive. 
( ^inversely, if the distribution is negatively skewed, y\ is negative. The 
second moment is used in the denominator of the expression for y\ in 
order to make the measure independent of the scab 1 of measurement. 

The most acceptable measure of kurtosis is obtained from the second 
and fourth moments and is defined as 

w 4 .j 

= mS ~ 3 4 - ,S 

When </2 is zero, the distribution is a particular type of symmetrical dis- 
tribution known as a normal distribution. W hen 02 is less than zero, the 
distribution is flatter on top than the normal distribution When g 2 is 
greater than zero, the distribution is more peaked than the normal dis- 
tribution. The rationale underlying g 2 as a measure >f kurtosis i* based 
on the fact that if two frequency distributions have the same standard 
deviation and one is more peaked than the other, the more peaked must, 
of necessity, have thicker tails. As a consequence, the sum of the devia- 
tions about the mean raised to the fourth power will he greater for the 
more peaked distribution than for the less peaked. 



Exercises for chapter 4 


77 


EXERCISES 

1 For the measurements 2, .1, 9, 10, 15, 19, compute (a) the range, 
( b ) the mean deviation, (c) the sample variance, (d) the standard 
deviation. 

2 The variance calculated on a sample of 100 cases is 15 What is the 
sum of squares of deviations about the arithmetic mean? 

3 A biased variance estimate calculated on a sample of 5 cases is 10. 
What is the corresponding unbiased variance estimate ? 

4 Compute the sum of squares about the arithmetic mean, the sample 
variance, and the standard deviation for the following frequer cv dis- 
tribution by selecting an arbitrary origin within the interval 20 to 24. 


Class 

interval / 

40-44 1 

35 39 0 

30-34 2 

25 29 5 

20 24 4 

15-19 8 

10-14 0 

5-9 3 

0-4 1 


N = 30 

5 The variance for a sample of N measurements is 20. What will the 
variance be if all measurements are (a) multiplied by a constant 5 and 
(b) divided by a constant 5? 

6 Show that 

2 (A r - X) 2 = EX’* - NX* 

7 The variance calculated from a frequency distribution with a class 
interval of 0 is 19. M hat is the variance corrected for grouping 
error? 

8 Obtain the variance and the standard deviation of the combined 
groups for the following data: 

a Hi = 6 Ai = 20 S\* = 50 

7/2 = 4 A r 2 — 10 S2 2 = 40 

b ?i 1 = 14 X l = 50 «i 2 = 400 

n 2 = 0 Xt = 75 s 2 2 = 250 



78 


Measures of variation, skewness, and kurtoais 


chap. 4 


9 Express the measurements 2 , 4, 5, 6 , 8 in standard-score form. What 
is the sum of squares of standard scores? 

10 The mean and standard deviation of marks in a statistics examination 
for a class of 20 students are, respectively, 62 and 10. Three students 
make scores of 50, 75, and 90. What are their corresponding stand- 
ard scores? 

1 1 Calculate the second, third, and fourth moments for the observations 
4, 6 , 10, 14, 10 . Compute also measures oi skewness and kurtosis. 



Probability 
and the 
Binomial Distribution 


KJ .i Introduction 

In experimental work a line of theoretical speculation may lead to the 
formulation of a particular hypothesis. An experiment is conducted, and 
data obtained. How are the (lata interpreted? Do the data support 
the acceptance or rejection of the hypothesis? What rules of evidence 
apply? Questions of this type involve considerations of probability 
The answers are in probabilistic terms. The assertions of the investi- 
gator are not made with certainty but have associated with them some 
degree of doubt, however small. 

Consider a hypothetical illustration. Two methods for the treat- 
ment of a disease are under consideration. Two groups of 20 patients 
suffering from the disease are selected. Method .1 is applied to one 
group, method B to the other. Following a period of treatment, 10 
patients in group A and 10 patients in group B show marked impiove- 
meut. How may this difference he evaluated 9 May it. be argued from 
the data that treatment .1 is m general superior to treatment B ? Here 
the investigator proceeds by ad ipting a trial hypothesis that no difference 
exists between the two treatin' nts, that one treatment is no better than 
the other. He then estimates the probability of obtaining by random 
sampling under this trial hypothesis a difference equal to or greater than 
the one observed. If this probability is small, say the chances are le-s 
than 5 in 100, he may consider this sutlicient evidence for the rejection 
of the trial hypothesis and may be prepared to assert that one method of 
treatment is better than the other. If the probability is not small and the 
observed difference may be expected to occur quite frequently under the 
trial hypothesis, say the chances are 20 in 100, then the evidence does not 
warrant the conclusion that one treatment is better than the other. 

In general, the interpretation of the data of experiments is in proba- 
bilistic terms. The theory of probability is of the greatest importance in 
scientific work where questions about the correspondences between the 
deductive consequences of theory and observed data are raised. Proba- 



8o 


Probability and the binomial distribution 


chap, s 


bility theory had its origins in games of chance. It has become basic to 
the thinking of the scientist. 


5-2 

The nature of probability 

Diverse views of the nature of probability may be entertained. The topic 
is controversial. No inclusive summary of these different views will be 
attempted here. We shall discuss three approaches to probability: (1) 
the subjective, or personabstic, (2) the formal mathematical, and (3) 
the empirical relative-frequency approach These different ways of 
regarding probability are not incompatible. 

The term probability may be used subjectively to refer to an attitude 
of doubt with respect to some future event. For example, the assertions 
may be made that “It will probably rain tomorrow, ” or “The probability 
is small that I shall live to be 90 years old,” or “There is a high proba- 
bility that a particular horse will win the Kentucky Derby.” Fre- 
quently, numerical terms are used in making assertion*' of this kind, such 
as, “The odds are even that it will rain tomorrow,” or “J estimate that 
the chances are about 9f> in 100 that I shall die before I an? 90 year« old,” 
or “The chances are three to one that a particular horse will win the 
Kentucky Derby.” All such assertions, whether numerical terms are 
used or not, refer to feelings of degrees of doubt or confidence with regard 
to future outcomes. This subjective usage is sometimes spoken of as 
psychological, or personalistic, probability. 

A second usage defines the probability of an event as the ratio of the 
number of favorable cases to the total number of equally likely cases. 
This usage stems from a consideration of games of chance involving cards, 
dice, and coins. For example, on examining the structure of a die the 
assertion may be made that no basis exists for choosing one of the six 
alternatives in preference to another; consequently all six alternatives 
may be considered equally likely. The probability of throwing a par- 
ticular result, say, a 3, in a single toss is then J, there being one favorable 
case among six equally likely alternatives This approach to probability 
involves a concept of equally likely cases, w r hich has a degree of intuitive 
plausibility in relation to cards, dice, and coins. Difficulties present 
themselves, however, w r hen we attempt to apply this approach in situa- 
tions where it is impossible to delineate cases which can be construed to 
be equally likely These difficulties have led to the argument that 
equally likely means the same as equally probable; therefore the definition 
is circular because it defines probability in terms of itself. Arguments 
have been advanced to escape this circularity. These need not detain us. 



sec. 5.2 


The nature of probability 


81 


The difficulty, however, is readily resolved by observing that the concept 
of equally likely in this definition of probability is a formal postulate and 
is not empirical. I 11 effect we sav, “Let us postulate that certain events 
are equally likely, and given this postulate let 11 s deduce certain conse- 
quences. 1 his means that a theory of probability employing this 
postulate is a formal mathematical model. Jt may or may not correspond 
to empirical events. It may be demonstrated, however, that this model 
does approximate closely certain empirical events and consequently is of 
value in dealing with practical problems. 

The situation here is somewhat analogous to that in ordinary 
Euclidian geometry. Euclidian geometry is a formal system comprised 
of a set of axioms, or primitive postulates, and their deductive conse- 
quences, called theorems. The proofs of the theorems hold regardless of 
questions of correspondence with the empirical world. We know, how- 
ever, on the basis of lengthy experience, that these tin orems can be shown 
to correspond closely to the world aionnd 11s. In consequence, Euclidian 
geometry provides a valuable model for dealing with problem* in engi- 
neering, surveying, building construction, and many other fields. Both 
with Euclidian geometry and probability it is useful to draw a clear dis- 
tinction between the formal mathematical s\stem and the empirical 
events for which the formal system may serve as a model. 

A third approach to probability is thiough a consideration of relative 
frequencies. If a series of trials is made. say. A r , and a given event occurs 
r times, then r 'N is the relative frequency. This relative frequency may 
be considered an estimate of a value p If a longer scries of trials is made, 
the relative frequency will usually be clo^r r to //. The difference between 
r , N and p may be made as small ih we like by increasing the value of A\ 
The probability p is defined as the limit approached by the relative fre- 
quency as the number of trials is increased. 'This approach to probability 
requires that a population of events lie defined Probability m the rela- 
tive frequency in the population. It is a population parameter. The 
relative frequency in a sample of observation^ ^ an estimate of that 
parameter. To illustrate, consider a coin. 'Hie population of events 
may be regarded as an indefinitely large number of tosses which theo- 
retically could be made. The proportion of heads p in this population is 
the probability of a head. This is often assumed to be £. If the coin is 
tossed 100 times and a proportion .47 of heads js obtained, this may be 
taken as an estimate of p. 

The ways of regarding probability described here, the subjective or 
personalistic, the formal mathematical, and the empirical through the 
study of relative frequencies, are not incompatible, and indeed it may be 
argued that all three must of necessity coexist. While subjective, or 



82 


Probability and the binomial distribution 


chap. 5 


personalistic, probability may be an interesting topic of psychological 
inquiry, in practical statistical work use is made mainly of the formal 
mathematical and relative-frequency approaches, the latter being the 
operational complement of the former. 


5-3 

Possible outcomes 

Questions of probability frequently involve a consideration of the number 
of possible outcomes, sometimes spoken of as a set. For example, in 
tossing a single coin, there are two possible outcomes —either a head or a 
tail will occur. In tossing two coins, the four possible outcomes are IIH, 
HT, TH, TT. This means that both coins may be heads, the first a 
head and the second a tail, the first a tail and the second a head, or both 
tails. In tossing three coins, there are eight possible outcomes, which 
may be represented as HHH, HHT, HTH, THH, HTT, THT, TT11, 
TTT. In tossing four coins, the number of possible outcomes is 16. 
In throwing a single die, the number of possible outcomes is six, that is, 
either 1, 2, 3, 4, 5, or 6 may appear. In throw mg two dice, the number of 


possible outcomes may 

be listed 

as follows- 


1 

► 

11 

21 

31 

41 

51 

61 

12 

22 

32 

42 

52 

62 

13 

23 

33 

43 

53 

63 

14 

24 

34 

44 

54 

64 

15 

25 

35 

45 

55 

65 

16 

20 

36 

46 

56 

66 


These are the numbers which may come up in throwing two dice. Both 
dice may come up 1, the first may come up 2 and the second 1, the first 
3 and the second 1, and so on. Thus there are 36 possible outcomes to 
this experiment. In drawing a single card from a deck of 52 cards, the 
number of possible outcomes is 52. In drawing one card from a deck of 
52 cards and another from a different deck, the number of possible out- 
comes is 52 X 52 = 2,704. 

In the illustrative situations described above we may be willing to 
assume that the possible outcomes are equally likely. With most coins, 
dice, and cards this is a justifiable assumption, the validity of which may 
be readily checked by experiment. Under this assumption the proba- 
bility of obtaining tw r o heads, HJI, in tossing two coins is Two heads 
may occur once in four equally probable outcomes. We denote this 
probability by p(HII) = }. The probability of obtaining three tails in 
tossing three coins is p(TTT) = there being eight equally likely out- 



sec. 5.4 


Joint and conditional probabilities 


83 


comes. The probability of obtaining two 6’s in throwing two dice is 
y>(6,6) = jf fl . This event is seen to occur once in the set of 36 possible, 
and equally likely, outcomes. 

The illustrative applications above regard a probability as the ratio 
of the number of favorable cases to the total number of equally likely 
cases. This approach may be extended to situations where the number 
of possible outcomes are not equally likely, provided a method exists for 
either deducing or estimating the expected relative frequencies of the 
possible outcomes. In rolling two dice, we may decide simply to sum the 
two numbers which appear and to regard this sum as an outcome. The 
set of all possible outcomes, thus defined, consists of the 11 numbers 2, 3, 
4, . . . , 11, 12. These outcomes are not, of course, equally likely. 
Their expected relative frequencies may be obtained, however, from a 
consideration of the 36 possible, equally likely outcomes involved in 
rolling two dice. A 2 can occur only once in 36 equally likely outcomes; 
that is p( 2) = A 3 can occur twice; that is, p(3) - fo. Likewise 
p(4) = 3V, p(3) — thi, and so on. Thus the possible outcomes are not 
equally likely, but have different relative frequencies or probabilities. 
In this illustrative example we were able to deduce the probabilities 
associated with the possible outcomes from more elementary considera- 
tions involving equally likely cases. In many practical situations this 
is not possible, and the required probabilities must be estimated empir- 
ically. For example, what is the probability that a person chosen at 
random from the population of the city of Boston is over 60 years of age? 
The estimation of this probability requires an empirical method. Pre- 
sumably in this case the required probability can be estimated using 
census data. 


5,4 

Joint and conditional probabilities 

Consider a population of members classified with regard to two char- 
acteristics A and B , there being three classes, or strata, of A and two 
classes, or strata, of B, as follows • 


on 
40 

.40 25 35 1 00 


. 4 1 A 2 A » 


40 

15 

" 

00 | 

10 

.30 I 


The probability that a person is B\ is .60; that is, p(B\i = .60. The 
probability that he is Ai is .40; that is, p( Ai) = .40. Tbe probability 




84 


Probability and the binomial distribution 


chap. 5 


that he is both A\ and B i is .40; that is, p(AiBi) = .40 This latter 
probability is & joint probability. It is the probability that a member will 
fall simultaneously within two classes All the probabilities in the above 
table are joint probabilities. 

Now given the fact that a member is B u what is the probability that 
he is A\ } -4 2 , or A 3 ? To answer this question we divide the joint proba- 
bilities p( A iBi) = 40, p(AiB\) = .15, and ptAzBA ■= .05 by the proba- 
bility p(B i) = .(>0 to obtain p(A\/Bi) = .67, p(A 2 /Bi) = .25, and 
p(Az/Bi) — .08. These art 4 conditional probabilities. They are the 
probabilities of A h A 2 , and A 3 given B\. Note that the sum of these 
three probabilities is ] 00. Likewise, if a member is B 2 what is the proba- 
bility that he is A u A 2 , or A 3? Here p(A\/Bi) = .00, p(A 2 /B 2 ) — .25, 
and p(Az/B 2 ) = .75. These conditional probabilities may be written in 
tabular form as follows: 


l 00 
l 00 


Likewise, the conditional probabilities of B given A are 
. 1 i A> A 3 

B 2 

I 00 1 00 I 00 



A i A 2 A 1 


tfl 

67 

25 

OS 


00 


75 


5-5 

The addition and multiplication of probabilities 

In throwing a die six possible events may occur, [f we are prepared to 
assume, as in the formal mathematical approach to probability, that these 
six events are equally likely, then the probability of obtaining a 1, 2, 3, 4, 
5, or 6 in a single throw is the ratio of the number of favorable cases to 
the number of equally likely cases. Consider now the probability of 
obtaining either a 1, 2, or 3 in a single throw. Since there are now three 
favorable cases among six equally likely cases, this probability is readily 
observed to be £ + £ + | This is an application of the addition 

theorem of probability. This theorem states that the probability that any 
one of a number of mutually exclusive events will occur is the sum of the proba- 
bilities of the separate events. “Mutually exclusive” means that if one 





sec. 5.6 


Permutations and combinations 


85 


event occurs, the others cannot. To illustrate further, m tossing two 
coins four possi1)le events may occur Roth corns may he heads, both 
may be tails, the first may be a head and the* second a tail, or the first 
may be a tail and the second a head. These events exhaust the possible 
outcomes. They may be represented as HU, TT, I IT, Til. A^un, if 
we assume these four events to be equally likely, the probability of any 
one of the four events is J. By the addition theorem the probability of 
either two heads or two tails, that is, HH or TT, is J + J =- 

In throwing two dice the number of possible outcomes is and the 
probability of any particular outcomes assuming these to bo equally likely, 
is 3 * 8 , which is the product of the two independent piohahilities, or J X J 
This is an application of the multiplication theorem of probability. This 
theorem states that the probability of tfa> joint occurrence of two or won 
mutually independent events is the product of thar sepiraU probabditns. 
By mutually independent is meant that the omureuci* of one event 
does not affect the* occurrence of the other events. To illustrate, the 
probability of obtaining four heads in four tosses of a coin is 

1 -V I V J V 1 — 1 

J A j Al A1 - Tft 

The probability of drawing the ace, king, and queen of spades m that 
order in drawing one card from each of three woll-shulHed decks of YJ 
cards is X ^ X « - 1/140,608. The probability 01 drawing the 
ace, king, and queen of spades in that order, and without replacement, 
from a single deck of .V2 cards is 5 ! 4 X m X 6 0 , <>r 1 132,600. The proba- 
bility that the iirst card is the ace of spades is 5 5 . Having drawn one 
card; 51 cards remain, and the probability that the second card is the king 
of spades is S x. Similarly, the probability that the third card is the queen 
of spades is s \. The probability of the combined event is the product of 
the separate probabilities. 

5-6 

Permutations and combinations 

A knowledge of permutations an i combinations is useful in dealing with 
many problems involving probabilities. 

Consider two objects labeled A and B. Two arrangements are pos- 
sible, AIi and BA. With three objects labeled A, B, and C\ six arrange- 
ments are possible. These are ABC } ACB , B A C, BCA , CAB, and CBA. 
These arrangements are called permutations. In general, if there are 
n distinguishable objects, the number of permutations of these objects 
taken n at a time is given by n!, orn factorial, which is the product of all 



86 


Probability and the binomial distribution 


chap. 5 


integers from n to 1, or 

n(n — l)(n — 2) • • 3 X 2 X 1 

For n = 3,w!-3X2Xl =6. For n = 5, 

»« - 5 X 4 X 3 X 2 X 1 = 120 

Consider the number of seating arrangements of eight guests in eight 
chairs at a dinner table. The first guest may sit in any one of eight 
chairs. When the first guest is seated, the second guest may sit in any 
one of the remaining seven chairs. Thus the number of possible arrange- 
ments for the first two guests is 8 X 7 = 56. When the first two guests 
arc seated, the third guest may occupy any one of the remaining six 
chairs, and so on for the remaining guests. The number of piossible seat- 
ing arrangements for the eight guests is 8!, or 40,320, a number which may 
help explain the indecision of many hostesses. 

Instead of considering the number of ways of arranging n things n 
at a time, we may consider the number of wavs of ananging things r at 
a time, where r is less than n. Thus the possible arrangements of the 
objects A, B , and C taken two at a time are AR, AC , B* l, BC\ CA , and 
CB. Here we observe that there are three ways of selecting the first 
object and two ways of selecting the second. The number of arrange- 
ments is then 3X2 = 6. Similarly, on considering the number of 
arrangements of 10 objects taken three at a time, we observe that there 
are 10 ways of selecting the first, 9 ways of selecting the second, 8 ways of 
selecting the third. The number of arrangements is then 

10 X 9 X 8 = 720 

In general, the number of permutations of n things taken r at a time is 
P T * = n(n - 1) • • • (n - r + 1) = ”'-y 5-* 

The number of different ways of selecting objects from a set, ignoring 
the order in which they are arranged, is the number of combinations. 
Given the objects A , B , C, and D, the number of permutations of two 
from this set is 4 X 3 = 12. The arrangements are AB } BA , AC, CA, 
AD, DA, BC, CB, BD, DB , CD, and DC. Note that each arrangement 
occurs in two different orders. If we ignore the order in which each pair 
of objects is arranged, we have the number of combinations. In this 
example each pair occurs in two different orders. The number of com- 
binations is then 4X1 = 6. In general, the number of different com- 



sec. 5.7 


The binomial distribution 


87 


binations of n things taken r at a time is 

C n = n * - a 

r!(n — r)! 5,2 

The number of combinations of 10 things taken three at a time is 

101/3 !7 ! = 120 

The number of combinations of n things taken n at a time is clearly 
1, because there is only one way of picking all n objects if we ignore the 
order of their arrangement. 


5-7 

The binomial distribution 

In tossing 10 coins what is the probability of obtaining 0, 1, 2, . . . , 10 
heads? We are required to determine the probability of obtaining 0 
heads and 10 tails, 1 head and 9 tails, 2 heads and 8 tails, and so or Let 
us designate the 10 coin'- by the letters A, B, (\ D y E y F, G\ H , /, and J . 
Let us assume that all 10 coins are unbiased and that the probability of 
throwing a head or a tail on a single toss of any coin is 

Let us attend first to the probability of throwing 0 heads and 10 tails 
in tossing all 10 coins. The probability that coin A is not a head is 4, 
that B is not a head is ) 2 , lhat C is not a head is and so on. Therefore, 
from the multiplication theorem of probability, the probability that all 
10 coins are not heads, or that they are tails, is obtained by multiplying 
4 ten times; that is, (4) 10 , or 1/1,024. Thus in tossing 10 coins there is 
1 chance in 1,024 of obtaining 0 heads or 10 tails. 

Now consider the problem of obtaining one head and nine tails The 
probability that coin A is a head is 4- The probability that all the 
remaining nine coins are tails is (4) 9 . Therefore the probability that A is 
a head and all other nine coins are tails is (4) 10 . It is readily observed, 
however, that one head can occur in 10 different ways. A may be a head 
and all other coins tails, B may be a head and all the others tails, and so on. 
Since one head can occur in 10 different ways, the probability of obtaining 
one head and nine tails is 10(4) 10 = 10/1,024. Thus in tossing 10 coins 
there are 10 chances in 1,024 of obtaining one head and nine tails. 

Determining the probability of obtaining two heads and eight tails 
may be similarly approached. The probability that coins A and B are 
heads is (4) 2 . The probability that all the remaining coins are tails is 
(4) 8 . The probability that A and B are heads and all the remaining coins 
are tails is (4) 10 . We readily observe, however, that two heads can occur 



88 


Probability and the binomial distribution 


chap, s 


in quite a number of different ways. This number is the number of com- 
binations of ten things taken two at a time, fV 0 , which is JO X 0/ 2 - 45. 
Therefore the probability of obtaining two heads and eight tails is 45($) ,0 J 
or 45/1,024. Similarly, the probability of obtaining three heads and 
seven tails is fV’fi) 10 = 120 '1,024. Likewise, the probability of obtain- 
ing four heads and six tails is CY 0 (i) ,# = 210/1,024; and so on. The 
probabilities of obtaining different numbers of heads in tossing 10 coins 
are then as follows: 


No. of 
heads 

Probability 

10 

1/1,024 

9 

10/1,024 

8 

45/1,024 

7 

120/1,024 

f> 

210/1,024 

5 

252/1,024 

4 

210/1,024 

3 

120/1,024 

2 

45/1,024 

1 

10/1,024 

0 

1/1,024 


The above probabilities are the successive terms in the expansion of 
the symmetrical binomial (| + £) 10 . This expansion is 

in y- q 

a + i) i ° = a) 10 + ioqv* + a) 10 

+ rx" «>" + • ' ’ + «>" 


This is a particular case of the symmetrical binomial (* + i) n whose 
terms are 


a + *)- = w + «(»• + v ^yx r {i)n 


.f ' 


lK»-2) a) , 


1X2X3 


+ (*)" 5-3 


The symmetrical binomial is a particular case of the general form of the 
binomial (p + q) n f where p is the probability that an event will occur and 
q is the probability that it will not occur; that is, p + q = 1. This 



sec. 5.7 


The binomial distribution 


89 


binomial may be written as 


(p 4 q) n =■ p n + ?ip n ~ ] q + 

, r?(n - JK?i 
T 1 X 2 X~3 




n(n -_1) 

1 X 2 

■p n ~ t q s + 


+ ? 


n 


5-4 


The terms of this expansion for rc — 2, « = 3, and n = 4 are as follows: 

(p + ?) 2 - P 2 4- 2p? + q 2 

(p 4- 7) 3 = p 3 4 3p 2 <? 4- 3 pq 2 4- g 3 

(p 4- q) A = p' 4- 4p 3 r/ 4- bpy 4- 4pg* 4- q A 

The binomial (p + q) n can bi* 1 readily illustrated by considering a 
problem involving the rolling of dice. What are the probabilities of 
obtaining five, four, three, two, one, and zero (Vs in rolling five dice pre- 
sumed to be unbiased. The probability of obtaining a 6 in rolling a single 
die is J, whereas the probability of not obtaining a 6 is The required 
probabilities are given by the six terms of the binomial (| + f) 6 . These 
terms are 

(1 f S) 6 “ Ur + 5(1)«(«) + mi> 2 

+ J-^ ( i )1(|) ' + 5U )( | )4 4 (l) 1 


Thus the probabilities of obtaining five, four, three, two, one, and zero 
(Vs in rolling five dice are calculated as 
No. of 

(Vs Probability 

5 1/7,776 

4 25/7,77(5 

3 250/7,776 

2 1,250/7,770 

1 3,125/7,770 

0 3,125 7.776 


There is one chance in 7,/7o of obtaining five (Vs, 25 chances in 7,776 
of obtaining four (Vs, and so on. This distribution is seen to be asym- 
metrical. 

Any term in the binomial expansion may be written as 


( t r r, p'q n ~ T 


7 ? ! 

rl(n — r) 


p r q n ~’ r 


5*5 


where C T n is the number of combinations of n things taken r at a time. 



go 


Probability and the binomial distribution 


chap. 5 


Thus the probability of obtaining three heads in 10 tosses of a coin is 

. * 0 - ( 1 ) 3 / 1\7 = 

3!(10 -3 ™ 1,024 

The coefficients CV n in any expansion are 


n(n — 1) n(n — l)(n — 2) 
T X 2 ’ 1~X2 X _ 3 ’ 


These coefficients may be rapidly obtained for different values of N from 
what is known as Pascal's triangle. The coefficients for different values 
of N are written in rows in the form of a triangle as shown m Table 5.1. 
The number m any row is the sum of the two numbers to the left and 
right on the row above. This device is very useful in generating expected 
frequencies and probabilities. For example, for JV = 10, the entries in 
the triangle are the expected frequencies of heads, or tails, in tossing 10 
coins 1,024 times. The required probabilities in this case are obtained 
by dividing the frequencies by 1,024. 


5-8 

Properties of the binomial 

For the symmetrical binomial, where p = q ~ the mean, variance, 
skewness, and kurtosis are 


M = n/2 
<r 1 2 = n/i 

71 = o 

72 - —2 /n 


5-6 


Here we use the symbols n, <r 2 , y h and 72 instead of A T , s 2 , g h and g 2 , 


Table 5 1 
Pascal’s triangle 


n 

0 

1 

2 

3 

4 

5 

6 

7 

8 
9 

10 


1 

1 1 
1 2 1 

13 3 1 

1 4 6 4 1 

1 5 10 10 5 1 

1 6 15 20 15 6 1 

1 7 21 35 35 21 7 1 

1 8 28 56 70 56 28 8 1 

1 9 36 84 126 126 84 36 9 1 

10 45 120 210 252 210 120 45 10 1 



sec. 5.8 


Properties of the binomial 


91 


because the binomial is a theoretical distribution, m, <r 2 , 7i, and 72 may 
be viewed as population parameters rather than sample estimates. The 
above formulas may be illustrated by considering the tossing of five coins 
32 times. The expected frequencies of zero, one, two, three, four, and 
five heads are 1, 3, 10, 10, 5, 1. These frequencies are the coefficients of 
the expansion + £) 5 . Using the formula n = n; 2, the mean is n = 2.3. 
This is the expected mean number of heads in tossing five unbiased coins 
32 times. The variance of the distribution is <r 2 = z</4 = £ = 1.23. 
Because the distribution is symmetrical, the measure of skewness 71 = 0. 
The measure of kurtosis 72 — — 2 / n = — \ = —.40. Note that as n 
increases in size, becomes smaller. As n increases in size, 72 approaches 
zero as a limit. 

The above values may be obtained by direct calculation. Denote 
the number of heads by A', the frequencies by /, and a deviation from the 
mean of X by jc. 


X 

f 

/.V 

X 

u* 

/*• 


5 

1 

5 

2 50 

G 25 

15 625 

39 0625 

4 

5 

20 

1 30 

11 25 

16 875 

25 3125 

.1 

10 

30 

50 

2 50 

1 250 

6250 

2 

10 

20 

- 50 

2 50 

-1 250 

6250 

1 

5 

r» 

-1 50 

11 25 

- 16 875 

25 3125 

0 

1 

0 

-2 50 

G 23 

-15 625 

39 0625 

Total 

32 

so 


40 00 

000 

130 0000 


The arithmetic mean, /u, of this distribution is XfX 'N - fi = 2.50. 
The variance, a 2 , is 25/r*, A’ =- $2 =- 1.25. a 2 is here defined as Hfx 2 /N, 
and not £/.i 2 (N ~ 1 ). H-tc we are dealing with a theoretical model, 
and not a sample \alue. The skewness 71 is readily seen to be zero because 
the third moment Zfx*, N is zero The fourth moment is 

2fi if N= A&° = 4.0025 

and the kurtosis is 72 = 4.0025/1.25* — 3 = —.40. Here *V denotes the 
number of tosses, or observations, whereas n denotes the number of coins. 

In general the mean, variance, skew ness, and kurtosis of any binomial 
distribution are given by 
n = np 
a 2 = npq 
p-q 
yx \/npq 
1 — 6 pq 


5-7 



92 


Probability and the binomial distribution 


chap. 5 


In tossing an unbiased die, the probability p of throwing a 6 is £ and the 
probability q of not throwing a f> is | The expected probability dis- 
tribution of b's in tossing 10 dice is given b\ the terms of the expansion 
(ff -+ g ) 10 The mean of this distribution i** /x — rip = V == 1 h(57 
Thp variance is a* = npq - 10 X i X } - 1 389 The skewness 
7 i - 500 and the kurtosis 72 = 12 


5-9 

A hypothetical experiment 

The binomial distnbutioii is frequently used as a model m evaluating 
experimental results Sudi use'* of the binomial may be illustrated with 
reference to a hypothetical experiment 

An individual asserts that he has certain psychic powers which enable 
him to piedut the outcome of future events Vn experiment is arranged 
involving the tossing of a coin Tin* individual is roquin <1 to predic t th< 
outcome 111 10 tos-cs It we opeiate on the working hypothesis that the 
individual possesses no powers of the tvpe claimed, the probability of a 
corrcc t predic t ion by < hame alone m a single* toss of the com is '> 1 loin 

the binomial expansion (i t ]) ,n we can asm tain the proTiabihtio., of 
different numlieis of correct predictions Thus the probability of the 
individual successfully predicting the outcome- m j, 11 10 tiiab by chance 
alone* is 1 1,024, 01 00098 The* probability of nine successful predictions 
and one failure* is 10 1,024, or 00977 Tin* probability of eight successful 
predictions and two failures is 15 1,021, or 01395, and on The 
probability of nine or moie successful predictions 19 

00977 + 00098 - 01075 

and the probability of eight or more successful predictions is 
04395 + 00977 f 00098 = 05470 

Now clearly, before under taking the expenment, some agreement must 
be reached regarding the number of correct predictions we are prepared 
to accept as evidence for the rejection of the hypothesis that the indi- 
vidual possesses no powers of the type claimed 

We may agree arbitrarily that if the results obtained in the experi- 
ment could have occurred by chance with a small probab lity only, say, 
equal to or less than 05, then these results would be accepted as at least 
not incompatible with the claims for psychic powers We observe that 
the probability of eight or more correct predictions by chance alone is 
05470. This is greater than the 05 probability we have agreed to 
accept, consequently eight correct predictions would m this case not be 
considered sufficient evidence The only possibilities here which would 



Exercises for chapter 5 


93 


prove acceptable within the criterion adopted are nine or ten correct 
predictions. 

The experiment is conducted; seven correct predictions and three 
failures are obtained. The probability of seven or more correct predic- 
tions occurring by chance alone in ten trials may be calculated from the 
binomial distribution and is 170/1,024, or .17189. Thus there are about 
17 chances in 100 of obtaining a result by ordinary guessing equal to or 
better than the one observed. In consequence, the experimental results 
provide no acceptable basis for rejecting the working hypothesis that the 
individual possesses no powers of the type claimed. 

Let us suppose that the individual had made 10 correct predictions. 
Could we reasonably argue from this result that the individual in question 
did in fact possess psychic power? Quite clearly, such a result is not 
incompatible with the assertion of psychic power and provides no basis 
for rejecting that assertion. We observe, however, that circumstances 
other than the possession of psychic power may possibly have led to the 
result obtained; that is, alternative explanations of the results may be 
possible. 

I11 experimental situations of the type described we would ordinarily 
require more than 10 trials. Let us suppose that 1,000 trials had been 
made and of>0 correct predictions obtained. The probabilities required to 
evaluate this result would then be generated by the expansion ( V + 

Quite clearly, the calculation of the required probabilities directly from 
the binomial would involve almost prohibitive arithmetical labor. For- 
tunately, a very close approximation to the required probabilities '-an be 
readily obtained from the normal probability distribution, winch we shall 
now consider. 


EXERCISES 

1 In rolling a die, what is the probability of obtaining either a 5 or a 6? 

2 In rolling two dice on one occasion, what is the probability of obtaining 
either a 7 or an 11? 

3 In rolling two dice, what is the probability that neither a 2 nor a 9 will 
appear? 

4 In dealing four cards without replacement from a well-shuffled deck, 
what is the probability of obtaining four aces? 

5 On four consecutive rolls of a die a b is obtained. What is the proba- 
bility of obtaining a 6 on the fifth roll? 

6 An urn contains four black and three white balls. If they are 
draw’ll without replacement, what is the probability of the order 
BWBWBWB? 



94 


Probability and the binomial distribution 


chap. 5 


7 In rolling four dice, what is the probability that (a) all four will be 
alike, (6) all four will be different, (r) two will be alike and two will be 
different? 

8 In seating eight people at a table with eight chairs, what is the number 
of possible seating arrangements? 

9 In how many ways can two people seat themselves at a table w T ith 
four chairs? 

10 In tossing five coins, what is the probability of obtaining few'er than 
three heads? 

11 A multiple choice tests contain'? 100 questions. Each question has 
five alternatives. If a student guesses all questions, what score might 
he expect to obtain? 

12 In how r many w r ays can a committee of three be chosen from a group of 
five men? 

13 Assume that intelligence and honesty are independent. If 10 per 
cent of a population is intelligent and 00 per cent w honest, what is the 
probability that an individual selected at random is both intelligent 
and honest? 

14 A husband engages in random verbal behavior for 20 min in an hour. 
His wife is similarly engaged for 30 min in that hour. Neither listens 
to the other. Estimate the number of minutes of silence in the hour. 

15 A coin is tossed 10 times. What is the probability that the third head 
will appear on the tenth toss? 

16 The chances are three out of four that the weather tomorrow' will be 
like the weather today. Today is Sunday and is a sunny day. W hat 
is the probability that it will be sunny all week? What is the proba- 
bility that it will be sunny until Wednesday and not sunny on Thurs- 
day, Friday, and Saturday? 

17 What is the expected distribution of heads in tossing six coins 64 
times? 

18 What is the expected distribution of 6’s in rolling six dice 64 times? 

19 What is the probability of obtaining either nine or more heads or three 
or fewer heads in tossing 12 coins? 

20 A man tosses six coins and rolls six dice simultaneously. What is the 
probability of five or more heads and five or more 6's? 



The Normal Curve 


6 


Introduction 


The frequency di^tnbutmus of mam events m natuic aie found in piac- 
ti< c to be approximate d c Jose 1\ l>v a paitn ular bell hape <1 t\ pc of ( ur\ e 
known as the normal ntru 1 nois of i leasuiemci t md enois made in 
estimating population values fiom sample values are often assumed to be 
noimally distributed The frequency distubutions ol manv ph\sjfal 
biological and psychological measure ments are ohsuved to appioximate 
the normal foi in Because tin frequency of o< ( urrem e ot many events m 
nature can be shown * mpnuallv to conform fanlv closely to tin noimal 
curve this uuve can be used as a model in dealing with problems involv- 
ing these events Before proceeding with a detailed discussion of the 
normal ( urve, led us < onsidei hue Hy the nature of fune tions and frequent > 
cuives m geneial 


6.2 

Functions and frequency curves 

When two variables are so related that the values of one depend on the 
values of the other they aie said to lx lum lions of each other \ func- 
tion is desc riptive of < hange m one 4 anablc w ith c hange in another Hie 
area of a circle is a function of the ~*chus, and the volume of a tube is a 
function of the length of the edge Considu the equation V bX + a 
This is a linear function it <s rhe equation for a straight line, } and A 
are variables , b and a are c onst ants If 6 md a are know n, different values 
of X ean be substituted m the equation and the coi responding values of Y 
obtained If the paired values of Y and A r are plotted on graph paper, > 
on the vertical and X on the horizontal axis, a straight line results Y and 
X bear a functional relation to each othei , and this relation is linear } is 
sometimes spoken of as the dependent and X the independent variable 
A functional relation may be written in the general form Y = f(X) 



g6 


The normal curve 


chap. 6 



fig. 6.1 Frequent v curve showing area between \ = a and \ — b 


This simply states that Y is some fimetion of A’ Here the nature of the 
function is not specified. 

Consider now the binomial ( p f q) n The term'- in the binomial 
expansion are the expected relative frequencies or probabilities associated 
with particular events. Inspection of formula (5 41 indicates that any 
term in the binomial expansion is given by 

Pr = C r n p r q n ~ 6.1 

where p r is the probability of the rtli event. This expression is a function 
For fixed values of n , p } and q, different \alues of r may be substituted on 
the right and the corresponding values of p r obtained. Here p r is the 
dependent and r the independent variable. The variable r is restricted 
to the n + 1 value* 0, 1 , 2, . > n; consequently p r is also restricted to 

a fixed number of possible values. The paired values of p r and r may be 
plotted on graph paper, p r on the vertical and r on the horizontal axis. 
The resulting graph is a visual description of the functional relation 
between the event r and its relative frequency or probability p r . 

In the binomial the variable r is discrete and not continuous. In 
tossing 50 coins, for example, the number of heads or tails obtained is a 
discrete number. The value of p r changes from r to r -+ 1 by discrete 
steps. We observe, however, that as n increases in size we obtain a larger 
and larger number of graduations of the distribution and by increasing 
the size of n we can make the graduations as fine as we like. By con- 
sidering the situation where n becomes indefinitely large, that is, n 
approaches infinity, we arrive at the conception of a continuous frequency 
curve or function. This curve is the limiting form of the binomial. 



sec. 6.3 


The normal curve 


97 


Frequency curves are in certain instances conceptualized as extend- 
ing along the X axis from minus infinity to plus infinity; that is, the curves 
taper off to zero at the two extremities. Although this is so, the area 
between the curve and the horizontal axis is always finite. For con- 
venience this area is often taken as unity. 

O11 occasion it becomes necessary to find the proportion of the total 
area of the curve between ordinates erected at particular \alues of X , 
that is, between A" — a and X = b as shown in Fig. 0.1. This proportion 
is the probability that a particular value of X drawn at random from the 
population which the curve describes falls between a and b Because of 
this, frequency curves are often referred to as probability curves or proba- 
bility distributions. Statisticians a variety of theoretical frequency 
curves as models. 'The normal curve 1- 011c of the more important of these 


6.3 

The normal curve 

In tossing n coins the frequency distribution of heads or tails 1*. approxi- 
mated more closely b> tin* normal distribution a* >/ increases in size. The 
normal curve is the limit me form of the -ymmetrical binomial. The 
equation for the normal curve is 

V 

y r= e (X 6.2 

(T •\ / 27T 

whe.c Y = height <»f curve for particular values of A' 

7r - a constant - Ik 1 110 
e - lose of Napierian logarithms -- 2.718o 
A - number ot ease*, which means that the total 
area under the curve is A’ 

/x and a mean and standard deviation of the distribu- 
tion, respectively 

We have used the notation g and a ik this formula to represent the mean 
and standard deviation, instead of A r and because the formula is a 
theoretical model. Presumably g and cr may be regarded as population 
parameters. If N, g, and <r are known, different values of X may be sub- 
stituted in the equation and the corresponding values of ) obtained. If 
paired values of A r and Y are plotted graphically, they will form a normal 
curve with mean g, standard deviation <r, and area N. 

The normal curve is usually written in standard-score form. Stand- 
ard scores have a mean of zero and a standard deviation of 1. Thus 
g * 0 and cr = 1. The arda under the curve is taken as unity; that is, 



98 


The normal curve chap. 6 



fig 62 Normil mu vo showing height of the ordinalo it (hftmnt vilues 
of t/<t or z 


N = 1 With these substitutions we may wnte 


V 


1 

y r 

y/ 2ic 




fi 3 


Here z is a standard se 010 on Y and equal to (A r — n) <t Die seme z is 
a deviation in standard deviation units m asured along the base line of the 
eurve from a mean of /eio, dev iations to tlie right of the mean being posi- 
tive and those to the left negative The <uive has unit aiea and unit 
standard deviation By substituting difleient values of r in the abeve 
formula, different values of // may be rah ulated When ^ 0, 

V = 1 \/2t - W89 


This follows from the tac t that e° ~ 1 \ny teim raised to the zero povw 1 
is equal to 1 Thus the height of the ordinate at the mean of the noimal 
eurve m standard-se ore form is given by the number d989 For z - + 1 , 
y = .2420, and for z - +2, y = 0540 Sinulaily, the height of the curve 
may be calculated for anv value of z In practice the student is not 
required to substitute different values of z in the normal-curve formula 
and solve for y to obtain the height of the required ordinate These 
values may be obtained from Table A of the Appendix This table shows 
different values of y corresponding to different values of z It also shows 
the area of the curve falling between the oidmates at the mean and dif- 
ferent values of z 

The general shape of the normal curve can be observed by inspection 
of Fig. 6 2 The curve is symmetrical It is asymptotic at the extremi- 
ties; that is, it approaches but never reaches the horizontal axis It can 



sec. 6.4 


Areas under the normal curve 


99 


he said to extend from minus infinity to plus infinity The area under 
the curve is finite. 


6.4 

Areas under the normal curve 

For many purposes it is necessary to ascertain the proport ioi, of the area 
under the normal curve between ordinates at different points on the base 
line. We may wish to know ( 1 ) the proportion of the area under the 
curve between an ordinate at the mean and an ordinate at any specified 
point either above or below the mean, ( 2 ) the proportion of the total area 
above or below' an ordinate at any point on the base line, ( 3 j the propor- 
tion of the area falling between ordinate^ at any two points on the base line. 

Table A of the Vppendix shows the proportion of the area between 
the mean of the unit normal curve and ordinates extending from 2 = 0 
to 2 = 3. Let us suppose that we wdsh to find the area under the curve 
between the ordinates at 2 = 0 and z - + 1 . We note from 'Fable A 
that this area is .3413 of the total. Thus approximately 34 per cent of 
the total area falls between the mean and one standard deviation unit 
above the mean. The proportion of the area of the curve between 2 — 0 
and 2 - 2 is .4772. Thus about 47.7 pei cent of the area of the curve falls 
between the mean and two standard deviation units above the mean. 
The proportion of the area between 2 - 0 and 2=3 is .49855, or a little 
less than 40.9 per cent. 

The proportion of the area falling between 2 - 0 and z =- +1 is .3413. 
Since the curve is symmetrical the proportion of the area falling between 
2 --- 0 and 2 = — 1 is also .3413. The proportion of the area falling 
between the limits 2 - } l is therefore .3413 + .3413 - .0826, or roughly 
08 per cent. The proportion of the area falling between 2 = ±2 is 
.4772 4 .4772 = .0544, or about 95 per cent. The proportion between 
2 = ±3 is .49805 4 .49805 = .99730 or 99.73 per cent. The area outside 
these latter limits is very small and is only .27 per cent. For rough pra* 
tical purposes the curve is sometimes leken as extending from z = ±3. 

Consider the determination of the proportion of the total area above 
or below any point on the base line of the curve. For illustration, let 
the point be 2 =■- 1. The proportion of the area between the mean and 
2 = 1 is .3413. The proportion of the area below r the mean is .5000. The 
proportion of the total area below z = 1 is .5000 + .3413 = .8413. 
The proportion above this point is 1.0000 — .8413 = .1587. Similarly, 
the proportion of the area above or below any point on the base line can 
be readily ascertained. 

Consider the problem of finding the area between ordinates at any 



100 


The normal curve 


chap. 6 



xfff or ^ 

fig. 6.3 Normal curve showing areas between ordinates at different val 
ues of x/<r, or z. 



fig. 6.4 Normal curve showing area between ordinates at z 50 and 
* = 3 50 


two points on the base line Let us assume that we require the area 
between z = 5 and z -= 1.5. From Table A of the Appendix we note 
that the proportion of the area between the mean and z = 50 is .1915. 
We note also that the area between the mean and z = 1?5 is .4332. The 
area between z = .50 and z -= 1.5 is obtained by subtracting one area 
from the other and is .4332 — .1915 = .2417. The area for any other 
segment of the curve may be similarly obtained. 

On occasion we wish to find values of z which include some specified 
proportion of the total area. For example, the values of z above and 
below the mean which include a proportion .95 of the area may be 



sec. 6.5 


Areas under the normal curve- - illustrative example 


101 


required. We select a value of z above the mean which includes a propor- 
tion .475 of the total area and a value of z below the mean which also 
includes a proportion .475 ot the total area. From Table A of the Appen- 
dix we observe that the proportion .475 of the area falls between 2 — 0 
and z — 1.90. Since the curve is symmetrical the proportion .475 of the 
area falls between 2 = 0 and z = —1.90. Thus a proportion .95, or 
95 per cent, of the total area falls within the limits 2 - jzl.90. Also 
a proportion .05, or 5 per cent, falls outside these limits. Similarly, it 
may he shown that 99 per cent of the area of the curve falls within, and 
1 per cent outside, the limits z -- 12.58. 

6.5 

Areas under the normal curve- 
illustrative example 

The distribution of intelligence quotients obtained by the application of 
a particular test is approximately normal with a mean of 100 and a stand- 
ard deviation of 15. We are required to estimate what per cent of 
individuals in the population have intelligence quotients of 120 and above. 
The intelligence quotient of 120 m standard-score form is 

2 = (120 - 100) 15 = i.:w 

Thus an intelligence quotient of 120 is 1.3d standard deviation units 
above the mean. Reference to a table oi areas under the normal curve 
shows that the proportion of the area above a standard score of 1.33 is 
.092. Thus we estimate that on tins particular test about 9.2 per cent 
of the population have intelligence quotients equal to or greater than 120. 

We are required to estimate for the same te>t the middle range of 
intelligence quotients which includes 50 pc r cent of the population. A 
table of areas under the* normal curve shows that 25 per '*ent of the area 
under the curve falls between the mean and a standard score of — .075. 
Also 25 per cent of the* area falls between the mean and a standard score 
of +.075. Thus 50 per cent of the area falls between the limits of 
z = f .075. The standard-score scale lias a mean of zero and a staiidard 
deviation of unity. Here we must transform standard scores to the 
original scale of intelligence quotients with a mean of 100 and a standard 
deviation of 15. To transform standard scores to intelligence quotients 
we multiply the standard score by 15 and add 100. Thus the standard 
score —.075 is transformed to 15 X ( — .0/5) -f 100 = 89.88 and +.075 
to 15 X .675 + 100 = 110.12. Thus we estimate that about 50 per cent 
of the population have intelligence quotients within a range of roughly 
90 and 110. 



102 


The normal curve 


chap. 6 


6.6 

The normal approximation to the binomial 

The observation has been made that as n increases m size the symmetrical 
binomial is more closely approximated by the noimal distribution This 
means that the noimal distribution may be used to estimate binomial 
probabilities Consider a situation where ten coins are tossed a large 
number of tunes What is the probability of obtaining either se\en or 
more heads > Here the mean of the binomial is p. = 10 X i = r > 0 and 
the standard deviation is a — \/\ 0 “ 1 r >8 Because the normal dis- 
tribution is c out unions, and not disc rctc, we c onsider the value 7 as c over- 
mg the exae t limits (> "> to 7 "> Thus wc must ascertain the proportion of 
the area of the noimal curve falling abo\e an oidinate at b 5, the mean of 
the cuive being 10 and the staudaid deviation 1 18 In standaid-score 
form the value t> 1 is equivalent to z — ((> 1 — 10)/1 18 = 040 I he 
proportion of the area ot the normal cuive falling above an ordinate at 
z - 049 can be readilv ascertained from Table A of the \ppcndix and is 
171 Thus using the normal-curve approximation to the binomial we 
estimate the probability of obtaining seven oi more heads m tossing ten 
eoinsa*. 171 We may compare this with the oxac t probabilities obtained 
directly from the binomial expansion shown in Table <> 1 1 his proba- 

bility is 172 Here we note that the disc repant \ between the estimate 


Table 6 i 

Comparison of binomial probabilities with corresponding 
normal approximations for n = io and p - A 


No of 
he ads 

1 xact binomial 
probability 

Normal 

ipproximation 

10 

001 

002 

0 

010 

011 

S 

044 

044 

7 

117 

114 

6 

205 

205 

5 

246 

248 

4 

205 

205 

3 

117 

114 

2 

044 

044 

1 

010 

011 

0 

001 

002 

Total 

1 000 

1 000 







Exercises for chapter 6 


103 


obtained from the normal cuive and the exart binomial piobabihty is 
trivial 

Table 6 1 compares the binomial and normal piohabilities foi n — 10 
and p - J We note that in this instant e the diiTci ernes between the 
exact binomial piohabilities and the tot responding uonnal approxima- 
tions are small 

The accuiacy of the appioximation depends both on n and p, as n 
im reases in size the ac euiac y of the approximation is impiovc d Foi anv 
n as p depaits fiom l the approximation becomes less act mate 

6.7 

Summary of properties of the normal curve 

The following is a summary of properties of the noimal curve 

1 The emve is symmetrical Thu v * an, median, and 
mode coincide 

2 The maximum oidmatc of the cuive occurs at the 
mean, that is, wheie z - 0, and m the unit normal 
v urve is equal to W89 

3 The cuive is asymptotic It approaches but does not 
meet the hnn/ontal axis and extends fiom nm us 
mhnitv to plus infinity 

4 1 he points of mile* turn of the cuive occur at points 4 1 
standaid deviation unit above and below the mean 
Thus the (UTvt changes tiom convex to concave m 
i elation to the hon/ontil axis at these points 

5 Roughlv OS pci (('lit of the area of the cuive falls 
within the limits f 1 standaid deviation unit fiom the 
mean 

0 In the unit noimal cuixe the limits c - ± 1 9b include 
95 pei nut and the limits z - \ i 58 include 99 pei 

cent ot the total aiea of the curve, 5 per cent and l oei 
cent of the aiea, f espoc tivel> t falling btyond these 
limits 

EXERCISES 

1 bind the height of the oidmate of the normal curve at the following 
z values —2 15, —1 53, + 07, + 99, H 2 76 

2 Consider a normally distributed variable with A” = 50 and s - 10. 
For N = 200 find the height of the ordinates at the following values of 
X: 25, 35, 49, 57, and 63 



104 


The normal curve 


chap. 6 


3 Find the proportion of the area of the normal curve (a) between the 

mean and z = 1.49, (b) between the mean and z = 1.20, (c) to the 
right of z = .2 5, (d) to the right of z = 1.50, (e) to the left of 
£.= —1.2b, (/) to the left of 2 = .95, (g) between z = ±.50, (h) 
between z = -.75 and z = 1.50, (?) between z = 1.00 and z - 1.90, 

O') between 2 = 1.00 and 2 = 1.01. 

4 Find a value of z such that the proportion of the area (a) to the right 
of z is .25, (b) to the left of z is .90, (c) between the mean and z is .40, 
( d ) between ±z is .80. 

5 On the assumption that IQ’s are normally distributed in the popula- 
tion writh a mean of 100 and a standard deviation of 15, find the pro- 
portion of people with IQ’s (a) above 1)15, (b) above 120, (c) below 90, 
(d) between 75 and 125. 

6 A teacher decider to fail 25 per cent of the class. Examination marks 
are roughly normally distributed, with a mean of 72 and a standard 
deviation of (>. What mark must a student make to pass? 

7 In tossing 100 coins, estimate, using the normal approximation to the 
binomial, the probability of obtaining (a) more than 110 heads, (h) 
less than 95 heads, (r) between 95 and 110 heads. 

8 Error scores on a maze test for a particular strain of rats are known 
through prolonged experimentation to have an approximately normal 
distribution with a mean of 32 and a standard deviation of 8. In one 
experiment a control sample of six animals contains one animal with an 
error score of f>6. What arguments may be advanced for discarding 
the results for this animal? 

q Scores on a particular psychological test are normally distributed, 
with a mean of 50 and a standard deviation of 10. The decision is 
made to use a letter grade system A, H, (\ />, and E with the propor- 
tions .10, .20, .40, .20, and .10 in the five grades, respectively. Find 
the score intervals for the five letter grades. 

10 The following are data for test scores for two age groups: 



1 1 -year 
group 

14-ycar 

group 

X 

48 

56 

8 

8 

12 

N 

500 

800 


Assuming normality, estimate how many of the 11-year-olds do better 
than the average 14-year-old and how many of the 14-year-olds do 
worse than the average 11 -year-old. 



Correlation 


7 

I .1 Introduction 

Hitherto we have considered the description of a single variable. We now 
approach the problem of describing the degree of simultaneous or con- 
comitant variation of two variables. The data under consideration, 
sometimes called bivariate data, consist of pairs of measurements. The 
data, for example, may be measures both of height and weight for a group 
of school children, or measures both of intelligence and scholastic per- 
formance for a group of university students, or error scores for a group of 
experimental animals in running two different mazes. The essential 
feature of the data is that one observation can be paired with another 
observation for each member of the group. The study of this type of 
data has two closely related aspects, correlation and prediction. Correla- 
tion is concerned with describing the degree of relation between variables. 
Prediction is concerned with estimating one variable from a knowledge of 
another. We shall use for illustration the record of scores obtained on a 
psychological test administered to students entering university and exam- 
ination marks obtained by these same students at the end of the first 
year of university work. The investigator may concern himself with 
obtaining a simple summary description of the degree of relation or cor- 
relation between test scores and examination marks. On the other hand, 
he may focus attention on the prediction of examination marks from a 
knowledge of psychological test scons, his purpose being to use psycho- 
logical test scores to provide estimates, on university entrance, of subse- 
quent scholastic performance. 

Historically, the study of the prediction of one variable from a 
knowledge of another preceded the development of measures of correla- 
tion. In the year 1885 Francis Galton published a paper called Regres- 
sion towards Mediocrity in Hereditary Stature . Galton was interested in 
predicting the physical characteristics of offspring from a knowledge of 
the physical characteristics of their parents. He observed, for example, 
that the offspring of tall parents tended on the average to be shorter than 



106 Correlation 


chap. 7 


their parents, whereas the offspring of short parents tended on the aver- 
age to be taller than their parents. He used the word “regression” to 
refer to this effect. In modern statistics the term regression no longer has 
the biological implication assigned to it by Galton. In general, regression 
has to do with the prediction of one variable from a knowledge of another. 
Karl Pearson extended GaltonN ideas of regression and developed the 
methods of correlation extensively used today. 

The most widely used measure of correlation is the Pearson product- 
moment correlation coefficient. This measuic is used where the variables 
are quantitative, that is. of the interval or ratio type. Other varieties of 
correlation have been developed for use with nominal and ordinal varia- 
bles. One measure commonly usoddo describe the relationship between 
two nominal variables is the contingency coefficient. Methods used with 
ordinal variables are called rank-order correlation methods. These special 
types of correlation will be discussed in later chapters. 

In this chapter wo "hall present a discussion of correlation and pro- 
ceed in Chap. 8 to a discussion of prediction and its relation to correlation. 
The reader will bear in mind that correlation and prediction are two 
closely related topics. Certain topics pertaining to the interpretation of 
the correlation coefficient and assumptions underlying ds uMtcau only be 
discussed following a consideration of prediction. 

7-2 

Relations between paired observations 

Consider a group comprised of N members. Denote these by A 1, A 2, 
A 3, . . . , A v. Measurement" are available on each member on two 
variables, X and Y. The data may be represented symbolically as 
follows : 


Members 

Measurement 

-V 

Y 

. 4 , 

A'. 

r. 

A L. 

X, 

Y, 

A, 

X, 

Y, 

An 

Xn 

Ys 


Let us assume that measurements have been arranged in order of 
magnitude on X extending from Xi, the highest, to Xn, the lowest, meas- 



sec. 7.2 


Relations between paired observations 


107 


urement. (liven this arrangement on A\ we may consider the possible 
arrangements of Y with respect to X. Consider an arrangement where 
the values of Y are in order of magnitude extending from the highest to 
the lowest. Thus the member who is highest on X is also highest on T, 
the member who is next highest on X is next highest on T, and so on, 
until the member who is lowest on X is also lowest on Y. This situation 
represents the maximum positive relation between the two variables. 
Consider now an arrangement where the values of }’ are reversed so that 
Yi is the lowest and )\ v is the highest. The member who is highest on A r 
is lowest on Y, the member who i< next highest on A" is next lowest on 1 , 
and so on, until the member who U lowest on A r is highest on Y. This 
situation represents the maximum negative relation between the varia- 
bles. Consider a situation where the arrangement of }’ is strictly random 
in relation to A r . Values of > may bo inserted in a hat shuffled, drawn at 
random, and paired with values of A\ This is a situation of independ- 
ence. The two sets of variate values bear a random relation to each 
other. Under this arrangement wo may state that 110 relation exists 
between A’ and Y. Between the tw'o extreme arrangements, representing 
the maximum positive and negative relation, we may consider arrange- 
ment'-' which represent varying degrees of relation in either a positive or 
negative* direction. To illustrate, let us assume that the values of X for 
the members A 1 , J 3 , A*, and ,i 5 are the integers 3, 4, 3, 2, and 1. If 

the values of Y are the same integers and are also arranged in the order 
r>, 4, 3, 2, and 1, we have a maximum positive relation. If values of 1 are 
arranged in the order 4, o, 3, 2, and 1, we have clearly a high positive 
relation, although not the highest possible If values of I are arranged 
in the order 1, 2, 3, 4, and .*>, we have a maximum negative relation. 
Again an arrangement on Y of the kind 1, 2, 4, 3. o w*ould be high nega- 
tive, although not the highest possible. 

Relations of the kind described above may be examined by plotting 
the paired measurements on graph paper, each pair of observations being 
represented by a point. Such a plotting of measurements is sometimes 
called a scatter diagram. Inspection of a scatter diagram yields an 
intuitive appreciation of the degree of relation between the two variables. 
Figure 7.1 shows four such diagrams. 

Figure 7.1a is a graphical representation of a high positive relation. 
Note that the points fall very close to a straight line. If the points fall 
exactly on a straight line, a perfect, positive relation exists between the 
variables. Figure 7.16 show’s a 1 ow t positive relation, figure 7.1c shows 
a relation which is more or less rand* m. No systematic tendency is 
observed for high values of X to be associated with high values of Y and 
low values of X to be associated with low values of 1 , or vice versa. 



io8 


Correlation 


chap. 7 



X X 

(c) (d) 


fig. 7-i ( fl ) High positive correlation. ( b ) Low positive correlation, 
(c) Zero correlation, (d) Negative correlation. 

Figure 7.1 d shows a fairly high negative relation. Again, if all the points 
fall exactly along a straight line, a perfect negative relation exists. It is 
obvious that between the two extremes of a perfect positive and a perfect 
negative relation an indefinitely large number of possible arrangements of 
points may occur representing an indefinitely large number of possible 
relations between the two variables. 

7-3 

The correlation coefficient 

Measures of correlation are conventionally defined to take values ranging 
from —1 to +1. A value of —1 describes a perfect negative relation. 
All points lie on a straight line, and A" decreases as Y increases. A value 
of +1 describes a perfect positive relation. All points lie on a straight 
line, and X increases as Y increases. A value of 0 describes the absence 



sec. 7.3 


The correlation coefficient 


109 


of a relation. The variable A r is independent of Y or bears a random rela- 
tion to Y. Measures of correlation take positive values where the rela- 
tion is positive and negative values where the relation is negative. 

The most commonly used measure of correlation is the Pearson 
product-moment correlation coefficient. Many forms of correlation are 
particular cases of this coefficient. Let A" and Y be two sets of paired 
observations with standard deviations $ T and s v . We may represent the 
paired observations in standard-score form by taking deviations from the 
mean and dividing by the standard deviation. Thus 

X - X 

z * = -5- ■ 

Y — Y 

Zy = -- - - 

•Sy 

The standard scores have a mean of zero and a standard deviation of 
unity. The product-moment correlation coefficient, denoted by the 
letter r, is the sum of products of standard scores, divided by N — 1. 
The formula for r in standard-score form is 


^tZ X Z y 

= W- 1 


7.1 


Thus the correlation coefficient may be ohtained by converting the two 
variables to standard-score form, summing their product, and dividing by 
N - 1, 

A brief and rather incomplete digression on the rationale underlying 
the above coefficient is appropriate here. Consider a set of paired obser- 
vations in standard-score form. The sum of products of standard scores 
'LZxZy is readily observed to be a measure of the degree of relationship 
between the two variables. Let us consider the maximum and minimum 
values of 2 \z x z y . This sum of products is observed to take its maximum 
possible value when (1) the values of z x and z u are in the same order and 
(2) every value of z x is equal to the value of z y with which it is paired, the 
two sets of paired standard scores bemg identical. If the paired standard 
scores are plotted on graph paper, all points will fall exactly along a 
straight line with positive slope. Since all pairs of observations are such 
that z x = z yi we may write z x z v = z x 2 = 2 y 2 and 2z x z v = 2z x 2 = I>z v 2 . 
The quantity 2z* 2 = 2z y 2 = N — 1, as shown in Sec. 4.12. Thus we 
observe that the maximum possible value of 2z*z y is equal to N — 1. 
Similarly, 2z x z y will take its minimum possible value when (1) the values 
of z x and z y are in inverse order and (2) every value of z x has the same 
absolute numerical value as the z y with which it is paired, but differs in 



no 


Correlation 


chap. 7 


sign. This minimum value of Xz x z v is readily shown to be equal to 
— (A r — 1). Graphically, all points will fall exactly along a straight line 
with negative slope. When z x and z y bear no systematic* relation to each 
other, the expected value of ~z x z y will lie zero. We may define a coef- 
ficient of correlation as the ratio of the observed value of ~z x z v to the maxi- 
mum possible value of thi*> quantity; that is, r is defined as Zz T z y / (N — 1 ). 
Since 2 lz x z v has a range extending from .V — 1 to — (N -- 1), the coef- 
ficient r will extend from +1 to — 1. The reader will note that for any 
particular set of paired standard scores the maximum and minimum 
values of 2z r z v obtained by arranging the paned seines m direct and 
inverse order are not necessarily X - 1 and — (X — 1). \ maximum 

value equal to X - 1 will occur only when the paired observations have 
the characteristic that every value ol z x is equal to the value of z, with 
which it is paired. A minimum value* of - (X - 1) will occur only when 
every value of z x is equal to z u in absolute* \ alue*, but differs in sign. When 
the data do not have* these eharae*t rustics, the* limits of the* range of r, 
for the particular set of paired observations under consideration, will be 
less than +1 and greater than — 1. 


7-4 

Calculation of the correlation coefficient from 
ungrouped data 

The formula for the correlation coefficient m standard-score fonn is 
r = 2z x ? y (N — 1). The calculation of a correlation coefficient using 
this formula is somewhat laborious, as it re ‘quires the conversion of all 
values to standard scores Since r* = (X — X) und = (Y — } r )/.s*„ 
by substitution we may write the* iormula for the correlation coefficient 
in deviation-score form Thus 


= 2(JT - X)(Y - Y) 
(X — l)S Z Sy 


2(X - X)(Y - f) _ 

vT(X - A") ! 2 ;()' -I 7 )" 2 

= - 7 ** 


where x and y are deviations from the means X and Y respectively. 

The above formula for the correlation coefficient may be used for 
computational purposes. The calculation is illustrated in Table 7.1. 
The first two columns contain the paired observations on X and Y. 
These columns are summed and divided by N to obtain the means X and Y . 
Column 3 contains the deviations from the mean of X , and col. 4 the 



sec. 7.4 


Calculation of the correlation coefficient 


III 


deviations from the mean of Y. Columns 5 and 0 contain the squares of 
these deviations. These columns are summed to obtain £.r 2 and 2y 2 . 
Column 7 contains the products of j* and //, and this column is summed to 
obtain The correlation coefficient in this example is +.58. 

For certain purposes it is desirable to express the formula for the cor- 
relation in terms of the raw scores or the original observation**. This 
formula is as follows: 

N~XY - 2X2Y 

V\N2X 9 - ci\Y) 2 j[.vsr ? - (sr)*] 7-3 

This is one of the more convenient formulas to use where a calculating 
machine is available. Some modern calculat ing machines are so designed 
that pairs of observation*- may be entered successively on the machine and 
the terms 1\Y 2 , wT 2 , and 21 obtained m a single operation. 

Where a calculating machine is not available, tin* formula usually 
involves rather large and un wieldly numbers and the formula in deviation 
form may he preferred. 

The application of the formula for computing the correlation coef- 


Table 7.1 

Calculation of the correlation coefficient from ungrouped data 
using deviation scores 


I 

2 

3 

4 

5 

6 

7 

X 

Y 

1 

V 

t 2 

V* 

j-y 

5 

1 

- 1 

— 3 

1 

9 

F3 

10 

6 

\ 4 

4 *2 

16 

4 

-F8 

5 

l Z 

-1 

„2 

1 

4 

F2 

11 

s 

f 5 

F4 

25 

16 

.-20 

12 

5 

f 6 

4 1 

36 

1 

F6 

4 

1 

_ 2 

— 3 

4 

9 

+6 

a 

4 

-3 

0 

9 

0 

0 

2 

r> 

--4 

-4 2 

16 

4 

-8 

7 

5 

-hi 

FI 

1 

1 

-hi 

1 

2 

-5 

— 2 

25 

4 

+10 

60 

40 

0" 

0 

134 

~~52~ 

48 

X = 6 0 Y 

- 4 0 



Sx 1 

2y* 

Zxy 


Zxy 48 _ 

V'S 5 ** 1 Vl34 X 52 


r = 


+ .58 



112 


Correlation 


chap. 7 


ficient from raw scores is illustrated in Table 7.2. The first two col- 
umns contain the paired observations on X and V . These columns are 
summed and divided by N to obtain ZA T and ZY. Columns 3 and 4 con- 
tain the squares of the observations, and these are summed to obtain 2A" 2 
and 2V 2 . Column 3 contains the product terms XY, and the sum of this 
column is ZXY. The correlation is +.08, which checks with the value 
obtained by the previous method using deviation scores. 


7-5 

Bivariate frequency distributions 

In Chap. 2 we discussed the construction of frequency distributions for a 
single variable. A frequency distribution was defined as an arrangement 
of the data showing the frequency of occurrence of the observations 
within defined ranges of the values of the variable, the defined ranges 


Table 7.2 

Calculation of the correlation coefficient from ungrouped 
data using raw scores 


I 

2 

3 

4 

5 

X 

V 

X 2 

r* 

XY 

5 

1 

25 

1 

5 

10 

6 

100 

:i6 

60 

5 

0 

25 

4 

10 

11 

8 

121 

64 

88 

12 

5 

144 

25 

60 

4 

1 

16 

1 

4 

3 

4 

9 

16 

12 

2 

6 

4 

36 

12 

7 

5 

49 

25 

35 

l 

2 

1 

4 

2 

60 

~40~ 

494 

212 

288 

XX 

XY 

2A r * 

XY 2 

XX Y 


__ NXXYj-XXXY 

Vlxsx* - (sx) s j"[x5F* -7sV)»] 

10 X 288 - 60 X 40 _____ 

Vdo'x" 494^60*) (ld x 2T2 - W) 


\/l,340~X 520 



sec. 7.5 


Bivariate frequency distributions 


113 


being the (lass intervals Where one variable onlv is invoked, the dis- 
tribution may be spoken of as unuariaU The frequency-distribution 
idea mav be 1 paddy extended to two \anable situations \ frequenev 
distnbution invoking two variables ^ known as a hum tale frequenev 
distribution 

A bivariate frequency distribution is a table compiled of a number 
of rows and ( olumns r I he ( oluinns < out spond to ( lass mtt r v xls oi the A r 
variable and the rows to (kiss intervals ol the ) variable i adi pair of 
observations is entered a^ a talk in its appropriate c ell r I o illustrate, 
Table 7 { shows a bivariate frequenev distribution for a et of paired 


Table 73* 

Bivariate frequency distribution for two f orms of French 
reading test 


V b orm B 



0 4 

5 0 

10 14 

r> to 

20 24 

2> 29 

m 



25 V) 



1 

1 


1 

1 1 

1 1 

, 2 i 

1 

t 




1 

1 

1 


'/ 




iO U 





1 


1 

jL_ 1 

1 



2 

2f> 29 




1 

1 

! 1 

1 

1 1 

2 

1 1 

b 





1 

1 

1 

1 v 

in 

1 

1 

r 

20 24 




1 2 

1 

1 ■» 

] t 

1 

1 

l 

9 




1 

flu In 

1 

1 1\ a 



— 1 

I - 

15 19 



1 

o 

! s 




is 


/ 

/ 

III 

II 

i/ 


1 “ 



10-14 

1 

1 1 

3 

1 

2 

i 



1 

1 | 

s 


/ 

III 

III 





1 


5-9 

1 

3 

3 




1 


7 


/ 

/ 








0-4 

1 

l 







2 

A 

3 

~ 5 

7 | 

~ 13 

12 

7 

1 

6 

1 * 




Correlation 


chap. 7 


114 

observations, these being scores on two forms of a French reading test. 
In constructing such a distribution a person who makes a score of 27 011 
Form A and a score of 31 on Form B is entered as a tally in the cell that is 
common to the row corresponding to the class interval 2.”> to 29 on Form A 
and the column corresponding to the class interval 30 to 34 on Form B. 
Similarly, every pair of observations is entered as a tally in its appropriate 
cell. The tallies in each cell are then counted, and their numbei recorded. 
These numbers are the bivariate frequencies. By summing the bivariate 
frequencies in the rows we obtain, as show; in Table 7.3, the frequency 
distribution for the F variable, and by summing the columns w'p obtain 
the frequency distribution for the A' variable. The separate frequency 
distributions of X and F are usually written at the bottom and to the 
right of the table. I 11 the selection of class intervals for A' and Y the 
usual conventions regarding class intervals apply. 

Methods exist for the calculation of correlation coefficients from 
bivariate frequency distributions. These methods are now infrequently 
used and will not be described here. The idea of a bivariate frequency 
table is, however, of some importance. 

7.6 

The variance of sums and differences 

Let X and Y be two sets of measurements for the same group of indi- 
viduals. These, for example, may be marks on mathematics and history 
examinations for a group of university students. What is the variance 
of X + F ? If mathematics and history marks are added together, what 
is the variance of the sums? 

The sum of X and Y is A' -f F. The mean of the sum of X and F is 
A r + F, or the sum of the two means. We may then write the varianre 
of sums as follows: 

„ 2 2[(* + y) - (x + nr 

S/+v " N - 1 " 

= 2UJT - A') + (>' - f)] 2 
N - 1 

2(X_- A) 2 . 2(£ -JK) 2 
N - 1 ' + AT - 1 

2Z{X - X)(Y - Y ) 
+ ' N -1 

~ Sx 2 “F ~t~ 2rs x s y 7*4 

The variance of the sum of X and Y is the sum of the two variances 
plus 2 rSgSy, If the correlation between the two variables is zero, then 



Exercises for chapter 7 


US 


2 rs x s y = 0 and the variance of sums is simply the sum of the two vari- 
ances. Terms of the kind rs x s v are sometimes called covariajice terms, or 
covariances. 

Similarly, the variance of the differences between A r and F, the 
variance of A' — F, is readily shown to be 

Sr y 1 = Sx~ 4 s v 2 3 4 — 2 rs jr s y 7.5 

The variance of differences is the sum of the two variances minus the 
covariance term 2 rs x s u . 

Alternative formulas for the correlation coefficient may be obtained 
from the formulas for the variance of Mims and differences by writing 
these explicit for r. From the \anarice of sums we obtain 


r 


•S rV ~ S. r 2 - Sy 1 


2S X Sy 


7.6 


From the variance of differences we obtain 


r 


s x 2 4* s y ' 2 — s x u 2 
2s x s u 


7-7 


These formulas ran be readily adapted for computational purposes 


EXERCISES 

1 Would you expert the correlation between the following to be positive, 
negative, or about ztro** (a) The intelligence of parent-* and thcii 
offspring, (/>) scholastic ^ucce^s and annual income 10 years after 
graduation, (c) age and mental ability, (</) maiks on examinations in 
physics and mathematics, (c) wages and tin' cost of living, (f) birth 
rate and the numerosity of storks, (•/) scores on a dominance-sub- 
mission test for husbands and scores for their wives. 

2 The following are paired measin 'mints' 

X 5 8 9 7 0 1 

Y 3 7 8 8 5 9 

Compute the correlation between A’ and Y 

3 Show that 

2xy __ 2xy 
lN~^l)s x Sy " y/x^2y 2 

4 When N = 2, what are the possible values of the correlation 
coefficient? 



Il6 Correlation chap. 7 

5 The correlation coefficient is not necessarily equal to 1 when the 
paired measurements are m exactly the same rank order. Discuss 

6 Calculate the correlation coefficient for the following data using for- 
mula (7 3). 


X 

y 

A’ 

r 

X 

Y 

22 

18 

19 

25 

11 

17 

If) 

10 

7 

30 

5 

6 

9 

81 

0 

27 

26 

45 

7 

8 

40 

45 

19 

30 

4 

2 

11 

18 

8 

18 

4r> 

80 

27 

18 

1 

3 

19 

12 

19 

37 

9 

7 

20 

10 

80 

42 

18 

28 

35 

47 

25 

20 

40 

21 

49 

22 

10 

12 

9 

25 


7 Show that 

tSf y 2 = Sj-“ “I - *S V “ “ 2) 

8 Write a formula for the variance of the sum of three vanames 

9 Tnder what conditions will the \aimnce of the sum ot two vamihles 
equal the vanance of the difference between two valuables*’ 

10 Is the correlation between A' and 1 changed by adding a constant to 
A r or by multiplying X l)y a constant > 

11 The formulae in this chapter assum* that the \anance n defined as 
tS 2 ~ 2 ;(X — X) (N — 1 ) What isthe foimula for / m stamlaul-seoie 
form if the variance is defined as ,s J - £(A r - A') N" 



Prediction in Relation 

to Correlation 


Q 

V JJ .1 Introduction 

Psychologists ‘ind educationist^ are frequently concerned with problems 
of prediction The educational psychologist is mteiested m predicting 
the scholastic pertoimanee ol a child fiom a knowledge of intelligence test 
scores The nulustiidl psjchologM m the sold lion of an individual tor 
a paiticular type of employment make- a piedictiou about the subsequent 
job peiforinance of that individual iiom information available at the time 
of selet tion The cluneal p-vchologi t may direct hi- attention to pre- 
dicting the patient's iecepti\.t\ to treatment from information obtained 
prior to treatment In many aieas of human endeavor predictions about 
the subsequent behavioi of individuals are requited \ somewdiat elabo- 
rate statistical technology ha- evolved for dealing with tlie prediction 
piohlcm In this chapter we shall lestuct attention to the simplest 
aspect of piedictiou, tl\c piedietion o! one variable from a knowledge of 
another 

Prediction and con elation aie close lj related topics and an under- 
standing of one require- an undci -tainting of the other The presence 
of a zero con elation between two variables A' and ) may usually be 
interpreted to mean that they hear no systematic relation to each other 
A knowdedge of .V tells us nothing about ) , and a knowledge of Y tells 
us nothing about X. In predicting A from }’ or Y from X no prediction 
better than a random guess is t osmble The presence of a nonzero cor- 
relation between A" and Y implies that if wo know something about A' we 
know something about and vice versa. If knowing X implies some 
knowledge of 1", a prediction of ) from X is possible which is better than 
a random guess about Y made in the 1 absenc e of a knowledge of X. The 
greater the absolute value of the correlation between X and the more 
accurate 4 the prediction of one variable from the other. If the correlation 
between X and Y is either - 1 or +1, perfect prediction is possible. 



n8 


Prediction in relation to correlation 


chap. 8 



100 105 110 115 120 125 130 135 140 145 

X.l.Q 

flg. 8.1 Scatter diagram for data of Table 8 1 

8.2 

The linear regression of Y on A" 

Any set of paired observations may be plotted on graph paper, each pair 
of observations being represented by a point. Consider the data shown in 
Table 8.1, cols. 2 and 3. These columns contain intelligence quotients 
and reading-test scores for a group of 18 school children. These data are 
plotted in graphical form in Fig. 8.1. While the arrangement of points 
when plotted graphically shows considerable irregularity, we observe a 
tendency for reading-test scores to increase as intelligence quotients 
increase. 

Let us suppose that we are given a child’s intelligence quotient only 
and are required to predict his reading-test scores. How shall we pro- 
ceed? Clearly, the data show considerable irregularity. An exact 
correspondence between the two sets of scores does not exist. In this 
situation we may proceed by fitting a straight line to the data. This 
straight line provides an average statement about the change in one varia- 
ble with change in the other. It describes the trend in the data and is 
based on all the observations. If, then, we are given a child's intelli- 
gence quotient and are required to predict his reading-test score, we use 
the properties of the line. The method used in fitting a line to a set of 
points in a situation of this kind is the method of least squares. If our 



sec. 8.2 


The linear regression of )'on A r 


1 19 


interest resides in predicting Y from A", the method of least squares locates 
the line in a position such that the sum of squares of distances from the 
point?* to the line taken parallel to the T a.r>s is a minimum. This line is 
known as the regression line of Y on A". 

The general equation of any straight line is given by 

Y = hX + a 8.1 

The quantity a is a constant. It is the distance on the } r axis from the 
origin to the point where the line cuts the } r axis. It is the value of Y 
corresponding to A r = 0. If we substitute A' ~ 0 in the equation for a 
straight line, we observe that Y -= a. The quantity h i^ the slope of the 


Table 8.i 

Calculations for regression line of >' on X for ungrouped data* 


I 

2 

3 

4 

5 

6 

Pupil 

no 

IQ 

X 

Heading 

score 

V 

A' 2 

AT 

Kx ported 
reading 
score Y 

1 

118 

66 

13,024 

* , 78S 

68 

2 

99 

50 

9,801 

4,950 

55 

3 

118 

73 

13,924 

8.614 

68 

4 

121 

69 

14.641 

8,349 

70 

r> 

123 

72 

15,129 

S 856 

71 

t> 

98 

54 

9.604 

5,292 

54 

7 

131 

74 

17,161 

9,694 

77 

8 

121 

70 

14,641 

8,470 

70 

9 

108 

65 

11,664 

7,020 

61 

10 

111 

62 

12,321 

6,882 

63 

11 

118 

65 

13,924 

7,670 

68 

12 

112 

63 

12,544 

7,056 

64 

13 

113 

67 

12,769 

7,571 

65 

14 

111 

50 

12,321 

6,549 

63 

15 

106 

60 

11,236 

6,360 

60 

16 

102 

59 

10,404 

6,018 

57 

17 

113 

70 

12,769 

7,910 

65 

18 

101 

57 

10,201 

5,757 

57 

Sum 

~ 2,024 

1,155 

228,978 

130,806 




120 


Prediction in relation to correlation 


chap. 8 



A 

fig. 8 2 The slop< of d lint 


line The dope ot an\ line i- “in ph the intio of the dictum r m a \ citn al 
dneetion to the distame in a hon/ontal dnettion as llludiated m ]<ig 
8 2 Th( dope desi nhes the into ol miiea-e in ) with qji lease in A 
Jf a and b aie known, the location ot the line is uniquely fixed and loi any 
given value ol A we (an compute a coiiesponding value ol 1 

Wlieie the l egression line ol Y on \ is fitted by the method ot least 
squaie^, the slope ol the line b y ^ and (he point w heiethe line cuts the 1 axis 
a yz may be eah uiated by the foimulas 


byx 

dyz 


AA > - (2X1) X) 
S-Y - f(AA >' VI 

- y - b U z± x 

X 


8.2 


8.3 


The quantity 2A is the sum of A r , A 1 is the Mini of ) , -A } is the sum of 
products of A" and Y t IX 2 is the sum of squares ot \ , and A r is the numbei 
of cases 

To illustrate. < Guilder the data ol Table S 1 Columns 2 and 3 pro- 
vide intelligence quotients and leading stoics for the 18 school (hildren 
Column 4 pio\ ides the value s A *, and ( ol o the pioducts A' 1" Summing 
the columns, we obtain 

2XY = 130,S()b 
2X * 2,021 
I T - 1,1 .V) 

IX 2 = 228,978 
N = 18 



sec. 8.3 


The linear regression of \ on > 


121 


\pplu»H fotmula" (8 2) and (8 !) wo have 




1 a 


//■r 


1 10 MM» 2 02* x 1 IV, 18 
22X07S <2 021) IX 

1 1V> hTOX x 2 021 
IX 


- t,70S 
1 1 2 ", 


rh(' legation line of 1 on \ is the n de^ ub< d by the equation 

>' 070SA 112“) 


Tlu s\ n il,ol J ' K»s bi i n mtioduu d to iefi i to the i "tunatt d *aIu** ot Y, 
that 1*^ thr\ahu ot ) otm ilidlion a know ledge of \ ) h a ilhatiue 

from tlu A axis to tlu Inu lonespor dint; to cn\ \alue oi V H\ sub- 
stituting ‘in\ \aliu ot \ m ttu loi inula wo obtain 1 the estimated \ alue 
of 1 ( ohinin t, ol r ibli X 1 shows tin 4 iMimatid i* admg-ti * t moics 

obtained b\ appl\ n g tlas ngussjon equation 


8.3 

The linear regression of X on 1 

\l>o\ e w ( ha\( con idcmltlu i< mession of > on \ The iegrc ssmn lino 
has Ineti located in 01 di r to immim/e tlu* s,ini of squaie^ of the distances 
lion Ihe points to the line take n paialhl *o *}u Y axis (h\ en ieading-test 
si 011 s and intelligence quotients we 1 Dimmed mmehos with predicting 
leading-test scon^ fio n iiitelhgem e quotients If, howe\er, we wish to 
pil'd, it ml c lligenrc epiotients fiom nading-test scoies a different regres- 
sion line is used This is tin' legiession line of V on ) This line is 
located m a position su< h to minimize the sun. of squaies of the dis- 
tant es fiom thouomts to tlu line pmrtb 1 1 th( Xati*s \\ e see, therefore 
that two regression lines may hi fitted to any set of paired observations, 
the regression line of } on V and the regression line of A” on ) The 
regression of } on V is used in piedntmg 1 from V Tlu regiessmn of 
A r on } is used in pi -'du ting A iuuu 1 Fhesc two hn*"- will difler exi opt 
in the paitnular < ase when \ll the points fall ixaitlv on a stiaight line 
Under this cm uinstaiu e the iwo legiessum lines coincide 

The formula for tlu regression line ot X on } is gi\en by 

X' = bryY - aty 8.4 

The symbol A" is used to leter to the predicted value of A, the \aluo esti- 
mated lrom a knowledge ot ) , b^ is the slope of the 1 egression line, and 
a xy is the point wheie the line intercepts the A” axe The values of b ^ 



122 


Prediction in relation to correlation 


chap. 8 


and tiry may he calculated from the formula* 


b X y ~ 
and 

Ojy =■ 


IX Y - (ZXZY/N) 
2Y*’-~U2Y)*7N] 

2A r - bxyZY 

~N 


8.5 


8.6 


8.4 

Regression lines for a bivariate 
frequency distribution 

Where data are grouped \ n the form of a bivariate frequency table the 
frequencies in each row or each column of the table constitute a frequency 
distribution. Table 8.2 is a bivariate frequency distribution for score* 
on a verbal intelligence test and the Binet intelligence test. We note, for 
example, that 104 individuals make scores between 100 and 109 on the 
Binet. The frequencies in the 100 -109 column comprise the frequeney 
distribution of scores on the verbal test for all individuals with IQ's 
between 100 and 109 on the Binet. The mean score on the verbal test 
for these 104 individuals can be readily calculated from the distribution in 
the 100-109 column. If we know only that an individual's IQ falls 
between 100 and 109, the best estimate we can make of his verbal-test 
score is that he is at the mean of those individuals with IQ's between 100 
and 109. The means for all column arrays may be calculated. These 
are the mean verbal-test scores of the individuals falling within particular 
class intervals on the Binet scale. A straight line may be fitted to this 
set of means by the method of least squares. This line is the regression of 
Y on the regression line used in predicting verbal-test scores from Binet 
IQ's. Similarly, the means for the row' arrays may be calculated and a 
line fitted to these means by the method of least squares. This line is 
the regression of X on F, the regression line used in predicting Binet IQ's 
from verbal-test scores. The two regression lines are shown in Table 8.2. 


8.5 

Relation of regression to correlation 

If all points in a scatter diagram fall exactly along a straight line, the tveo 
regression lines coincide. Perfect prediction is possible. The correla- 
tion coefficient in this case is either —1 or +1. Where the correlation 
departs from either —1 or +1, the two regression lines have an angular 
separation. In general, as the degree of relationship between two varia- 
bles decreases, the angular separation between the two regression lines 







124 


Prediction in relation to correlation 


chap. 8 


increase*. Whore no systematic relationship exists at all, the two varia- 
bles being independent, the two regression lines are at right angles to 
each other. 

A simple relationship exists between the correlation coefficient and 
the slopes of the two regression lines. The slopes of the regression lines 
when expressed in deviation-score form are given by 

. _ 2(.Y - X)(Y - Y) 

>UI ~ (N-l)*j 

. 2S(.Y - A')(r - f) 8,7 

xv ~ (N - IK* 

Sin rc r = X(X - A’)() - f)/( X - l)w 


I, 


U-r 



**x 


b 


TV ~ 



8.8 


Multiplying these two expressions, we obtain 

byrbxy = r “ 8.9 

Thus the product of the slopes of the two regression lines is the square of 
the correlation coefficient. The geometric mean of the two slopes is the 
correlation coefficient. 

Because of the above relation between correlation and regression, we 
may write equations for the two regression lines, using the correlation 
coefficient. The two equations are as follows: 


1" = r ^ (A' - A’) + Y 
X' = r - (} r - Y) + X 


These are commonly used equations for predicting a raw score on one 
variable from a knowledge of a raw score on another. 

If measurements are represented in standard-score form, the correla- 
tion coefficient may be written as r = 2z x z v /(N — 1). If the pairs of 
standard scores are plotted graphically and two regression lines fitted to 
the data, the equation of these lines may readily be shown to be 


2* 


fZx 

rz y 


8.1 1 


where z f y and z' x arc the predicted or estimated standard scores. Both 
regression lines have the same slope, which is equal to the correlation 
coefficient. In this case the slope of the regression of Y on X relative to 



sec. 8 6 


Errors of estimate 


125 


the \ axis is the same as the dopt of the regiesMon of A r on } lelative to 
the } axis 


8.6 

Errors of estimate 

In pi ('dieting one variable fiom a knowledge of anothei, distances from 
eithei the A 01 ) axis to the ugit^ion line aie us(d as the predicted 
\ allies \ diffeieme between an obseived value and a pieduted \alue 
is an cijoi of estimate I bus in puda ting ) tiom A r , tin pi c du ted \alue 
> ' is a distant e fiom t lit \ axis to the regn ssion line and tla ddteienee 
between the obseived value o! ) and the piedieted \ alue, or 1 — Y\ 
is an eiior ol estimate It the nails of ob^ei \ ations when plotter! 
giaphnallv as points, all tall exatth along a stiaiglit line, all values of 
1 — }' 0 and peif(*<t piedntion is possible If the points appeal 

to be arranged at landom when plotted giaphnallv many values of 
) — )' will b< huge Die mon ac* urate the predictions possible the 
smaller tin values of) - ) ' will te ml to be The variance ol the errors 
ot estimate that is of ) — ) ' is t ik* j n a- a measuie 01 the auuiacj of 
estimate and is given b\ 

_ - 1 ') 

by X VI 8*12 


The square loot of this quantitv is the s tamlanl arm oj estimate 111 pre- 
du tmg ) tiom A 

The leader should note that s, r ,a> defined above, is not an unbiased 
estimate of a y / The n umbel of degrees ( f freedom assoi lated with the 
sum of s<piaies 1(1 — Y')* is not A - 1 but V - 2 there being X — 2 
deviations about the 1 egression line whn h aie free to vai v Here we have 
lathei arbitranlv d<‘hued the standard enor of estimate using V - 1, 
and not N -2, to simplify subsequent exposition 1 01 pui poses of 
descriptive statistics it is algebiaiealiy more ( on verm 1 it to us#> A — 1 and 
not N — 2 When, hovvevti a piohlem of estimation is involved, a 
definition using A 2 should be usui 

In predieting V from a k* ou ledge of ) , the vaname of the errors of 
estimate is 


2(A - AO 
N - 1 


8.13 


where X is an observed value and X' is a value of A r estimated from 
a knowledge of Y The square root of this quantity is the standard error 
of estimate in predicting A from Y 

The standard error of estimate is related to the con elation <oef- 



126 


Prediction in relation to correlation 


chap. 8 


fioient by the simple relation 

s VmX = s v y/\ — r 8.14 

and similarly 

s xv = s x y/\ — r 8.15 

By transposing these formulas we obtain relations as follows: 



The above constitutes, in effect, an alternative definition of the correla- 
tion coefficient. If all pairs of points when plotted graphically fall 
exactly on a straight line, both x 2 — 0 and .s - XI/ 2 = 0. I 11 consequence, 

r will be either +1 or — 1 , depending on whether we take the positive or 
the negative square root. If the points are arranged at random when 
plotted graphically, A" and Y being independent of each other, s x „ 2 = s x 2 , 
s v .* 2 = V, and r = 0. The value of the correlation is wen, therefore, to 
depend on the ratio of two variances, s ux 2 /.% 2 or # JV V«* 3 . These two 
ratios are equal. 


8.7 

The variance interpretation of the 
correlation coefficient 

A correlation coefficient is not a proportion. A coefficient of .00 does not 
represent a degree of relationship twice as great as a coefficient of .30 
The difference between coefficients of .40 and .50 is not equal to the differ- 
ence between coefficients of .50 and .00. The question arises as to how 
correlation coefficients of different sizes may be interpreted. One of the 
more informative ways of interpreting the correlation coefficient is in 
terms of variance. 

A score on F may be viewed as comprised of two parts, an estimated 
value F' and an error of estimation (F — F'). Hence 

Y = F' + (Y - Y') 

These two parts are independent of each other; that is, they are uncor- 
related. The variances of the two parts are directly additive, and we may 
write 

8y 2 = Sy' 2 + 8p, X 2 8.17 

where s p 2 = variance of F 

V 2 = variance of values of F predicted from X, 
that is, values on regression line 
8 u .x 2 = variance of errors of estimation 



sec. 8.7 


The variance interpretation of the correlation coefficient 


127 


The variance s y , x 2 = s y 2 (] — r 2 ). By substitution we obtain 
V = v 2 + VU - r 2 ) 

Dividing this equation by s y 2 and writing it explicit for r 2 , we have 



Similarly, it may be shown that 



These expressions state that r 2 is the ratio of two variances, the variance of 
the predicted values of Y or X divided by the variance of the observed 
values of Y or X. 

The variance v 2 * s that part of the variance of Y which can be pre- 
dicted from, explained by, or attributed to the variance of X. It is a 
measure of the amount of information we have about Y from our informa- 
tion about X. If r = .80, r 2 = .04, and we can state that 04 per cent of 
the variance of the one variable is predictable from the variance of the 
other variable. We know 04 per cent of what we would have to know to 
make a perfect prediction of the one variable from the other. Thus r 2 
can quite meaningfully be interpreted as a proportion and r 2 X 100 as a 
per cent. In general, in attempting to conceptualize the degree of rela- 
tionship represented by a correlation coefficient it is more meaningful to 
think in terms of the square of the correlation coefficient instead of the 
correlation coefficient itself. The values of r 2 X 100 for values of r from 
.10 to 1.00 arc as follows: 

r r 2 X 100 



.60 

36 

.70 

49 

.80 

64 

.90 

81 

1.00 

100 


Thus a correlation of .10 represents a 1 per cent association, a correlation 
of .50 represents a 25 per cent association, and the like. A correlation of 



128 


Prediction in relation to correlation 


chap. 8 


.7071 is required before we can state that 50 per cent of the variance of 
the one variable is predictable from the variance of the other. With a 
correlation as high as .90 the unexplained variance is 19 per cent. 

The existence of a correlation between two variables is indicative of a 
functional relationship, but does not necessarily imply a causal relation- 
ship. Whether a functional relationship can be regarded as a causal 
relationship is a matter of interpretation. The correlation between the 
intelligence of parents and their offspring has been frequently reported to 
be of the order of .50. This may be interpreted as indicative of a causal 
relationship. Frequently two variables may correlate because both are 
correlated with some other variable or variables. For example, given a 
group of children with a substantial range of ages, a correlation may be 
found between a measure of intelligence and a measure of motor ability. 
Such a correlation may come about because the measures of intelligence 
and motor ability are both correlated with age. If the effects of age are 
removed, the correlation may vanish. 

8.8 

Assumptions underlying the correlation coefficient 

In interpreting the correlation coefficient it is assumed that the fitting of 
two straight regression lines to the data does not distort or conceal the 
functional relation between the two variables. If the relation is curvi- 
linear, a coefficient of zero may be obtained and yet a close relation may 
exist between the two variables. Figure 8.3 shows a curvilinear relation 
between X and Y . If X is known, a fairly accurate prediction can be 
made of Y. If, however, two straight regression lines are fitted to the 
data, these lines will be about at right angles to each other and r will be 
about zero. If a strictly random relation exists between X and Y, the 
correlation will be zero. The above example demonstrates that the con- 
verse does not hold. If the correlation is zero, it does not necessarily fol- 
low that X and Y bear a random relation to each other. This may mean 
that the linear-regression model is a poor fit to the data. In interpreting 
the correlation coefficient it is ordinarily assumed that the linear-regres- 
sion model is a good fit to the data and that a correlation of zero means a 
random relation. Consider a situation where r = .80. This means that 
64 per cent of the variance of the one variable is predictable from the other 
and the residual 36 per cent is due to other factors. The assumption is 
that these factors do not include, at least to any appreciable extent, a 
lack of goodness of fit of the linear-regression lines to the data. If a 
large proportion of the residual 36 per cent did result because of non- 
linearity, this would affect the interpretation of the data. In interpreting 



sec. 8.8 


Assumptions underlying the correlation coefficient 


129 


o 

o O 


O o 0 

o O o 


0 o 


o o 
o 


o 0 o 


O O o 

O O 

o o 

° O o 


o O “ 

O O o 

000 


fig- 8.3 Scatter diagram showing cumline;ii 1 elation. 


a correlation coefficient the investigator should satisfy himself that the 
linear-regression lines aie a good lit to the data. Any gross departure 
from linearity can readily be detected by inspection ol the bivariate fre- 
quency table. For small values of N , curvilinear relations may be 
difficult to detect. In practice, for many of the variables used in psy- 
chology and education the assumption of linearity of regression i« in 
most instances reasonably well satisfied. 

In calculating a correlation coefficient it need not be assumed that 
the distributions of the two variables are normal. Correlations can be 
computed for rectangular and other types of distributions. If the two var- 
iables have different shapes, however, this circumstance will impose con- 
straints upon the correlation coefficient. If a positively skewed distribu- 
tion is correlated with a negatively skewed distribution, the difference^ in 
the shapes of the distributions will influence the correlation coefficient. 
Some part of the departure of the correlation coefficient from unity will 
result because of the different shapes of the two distributions, fn such a 
situation as this the differences in shapes of the distributions will in effect 
ensure that one or the other or both regression lines are nonlinear. In 
psychological research substantial differences in the shapes of the distri- 
butions under study occasionally are found. Under these circumstances 
it is common practice to transform the variables to a binomial or to an 
approximately normal form. Such transformations will frequently tend 
to eliminate curvilinearity of regression. 



130 


Prediction in relation to correlation 


chap. 8 


Many other circumstances affect the correlation coefficient. Among 
these may be mentioned sampling error and errors of measurement. The 
effects of these on the correlation coefficient are discussed in later 
chapters. 

EXERCISES 

1 The following are paired measurements : 

X 1 5 6 6 2 

Y 2 4 5 3 1 

Compute (a) the correlation between X and F, ( b ) the slope of the 
regression line for predicting X from a knowledge of F, (c) the slope of 
the regression line for predicting F from a knowledge of X, ( d ) the 
regression equation for predicting a standard score on X from a stand- 
ard score on F, (e) the regression equation for predicting a raw score on 
X from a raw score on F, (/) the variance of the errors of estimation in 
predicting F from X. 

2 The following are marks on a college entrance examination X and first- 


year averages Y for a sample of 

20 students. 




X 

Y 

X 

Y 

X 

Y 

X 

Y 

55 

61 

70 

75 

63 

85 

77 

84 

79 

72 

80 

61 

64 

87 

62 

72 

59 

69 

89 

79 

69 

70 

85 

70 

81 

89 

92 

90 

75 

90 

55 

60 

62 

52 

60 

55 

84 

67 

66 

67 


Compute (a) the correlation between entrance examination marks and 
first-year averages, (6) the regression equation for predicting first-year 
averages from examination marks, (c) the predicted first-year averages 
for the 20 students, ( d ) the variance of the errors of estimation. 

3 Standard scores on variable X for four individuals are —2.0, —1.68, 
.18, 1.16. The correlation between X and F is .50. What are the 
estimated standard scores on F? What is the standard error in esti- 
mating standard scores on F from standard scores on X? 

4 From the data £ = 40.3, f = 12.5, s. = 12.6, s, = 3.6, and r„ = .60, 
write the regression equations for predicting Y from X and X from Y. 

5 Show that 



Exercises for chapter 8 


131 


6 A correlation of .7071 may be interpreted to mean that 50 per cent of 
the variance of one variable is predictable from the other variable. Is 
this statement correct if the regression lines are not linear? 

7 A variance s v 2 = 400 and the correlation between X and Y is .50. 
What is the variance of the errors of estimation in predicting Y from 
X ? What is the variance of the predicted values? 

8 What correlation betw r een X and Y is required in order to assert that 
75 per cent of the variance of A r depends on the variance of Y ? 

9 In predicting Y from a knowledge* of X } the standard error of estimate 
is 5, and the mean of the errors of estimation is zero. Assuming the 
errors to be normally distributed, indicate the limits above and below 
the mean that include 95 per cent of the errors of estimation. 



Sampling 


Q 

Introduction 

In Chap 1 the concepts of population ami sample wore discussed \ 
population is any defined aggregate of objects persons, 01 events, the 
variables used as the basis for classification 01 measuieinent being spec i- 
fied V s amph is anv subaggiegate diawn fioni the population \ny 
statistic calculated on a sample ol obsci\ations an < dimate of a coi- 
iesponding population value or paiameter The symbol \ is used to 
refei to the arithmetic mean of \ calculated on a sample of *\ao N The 
symbol n is used to lefer to the mean of tin* population fcimilaily, s’ is 
used to refei to the va nance in the sample and <r 2 is the c onesponding 
population parametei A is an estimate of /i, and s is an estimate of a " 
Likewise, an> othei statistic calculated on a sample is an estimate of a 
coi responding population paiamctei In most situations the paiameteis 
are unknown and must be estimated in some manner horn the sample 
data 

Much statistical woik in piac tice is concerned with the use of sample 
statistics as estimates of population paiameteis, and moie paiticulaily 
with describing the magnitude* of cm or which attaches to such statistics 
The body of statistical method concerned with the making of statements 
about population parameters fiom sample statistics is calk'd sampling 
statistics, and the logical process involved is called statistical inference, 
this being a ngoious foim of inductive mfcience If mfeicncos about 
population parameters are to be drawm tiom sample statistics, ceitain 
conditions must attach to the methods of sampling used 


9.2 

Methods of sampling 

In drawing inferences about the characteristics of populations from 
sample statistics, the assertion is frequently made that the sample should 
be drawn at random from the population. The sample is spoken of as a 



sec. 9.2 


Methods of sampling 


133 


random sample . The word “random” is used in at least three ways. It 
may refer to our subjective experience that certain events are haphazard 
or completely lacking in order. It may be used in a theoretical sense to 
refer to an assumption about the equiprobability of events. Thus a 
random sample is one such that every member of the population has an 
equal probability of being included in it. When the word is used in this 
way the meaning is assigned to it within the framework of probability 
theory. The word “random” is also used in an operational sense to 
describe certain operations or methods. Thus the drawing of numbers 
from a hat after they have been thoroughly mixed, or the drawing of 
cards from a deck after they have been well shuffled, or certain tech- 
niques used in sweepstakes, lotteries, and other games of chance are 
examples of random operations or methods. Sampling theory in statis- 
tics is based on the theoretical use of the word “random,” that is, on the 
idea of the equiprobability of each population member being included in 
the sample. This equiprobability assumption underlies the derivation 
of many formulas used in sampling statistics. Practical operational 
methods of sampling are frequently such as to ensure that the theo- 
retical assumption of equiprobability is closely approximated in practice. 
If methods of sampling ensure approximate equiprobability, then clearly 
the deductive consequences of a theory based on the idea of equiproba- 
bility can be used in dealing with practical sampling problems. The cor- 
respondence between the consequences of theory and what occurs in 
practice can, of course, always be checked by experiment. 

For a finite population, if the members are listed or catalogued, a 
random sampling procedure may readily be applied. Consider, for exam- 
ple, a population containing 2,000 member-, from which a sample of 209 
members is required. All names, or some identifying code number, may 
be entered on slips of paper. These may be placed in a container from 
which 200 slips are drawn. A more convenient procedure is to use a 
table of random numbers. The members of the population may be 
labeled from 1 to 2,000, perhaps according to the order they appear on 
the list. Numbers may then be read directly from a table of random 
numbers. These numbers will identify vhich of the 200 members from 
the population of 2,000 should be included in the sample. Tables of 
random numbers are found in many texts on statistics or in Fisher and 
Yates, Statistical Tables for Biological , Agricultural , and Medical Research . 
If a list is arranged alphabetically, or in some other systematic fashion, 
every nth name may be chosen in the construction of the sample. This is 
sometimes spoken of as systematic sampling. Although in most practical 
situations such a sample may be viewed as random, the possibility exists 
that the variable under investigation may not be independent of the 



134 


Sampling 


chap. 9 


basis of ordering, and a biased sample may result. On occasion, samples 
are drawn from lists which do not provide a complete record of all mem- 
bers of the population, but are viewed, perhaps erroneously, as representa- 
tive of the population. A telephone directory is such a list. Names 
chosen from a telephone directory may yield a biased sample of the 
population at large, because ownership or nonownership of a telephone 
may not be independent of the variable under investigation, e.g., how a 
person intends to vote in an election. 

Forms of modified random sampling are sometimes used. One 
example is stratified random sampling. This procedure requires prior 
knowledge, perhaps obtained from census data, about the number or pro- 
portion of members in the population of various strata. Thus we may 
know the number of males and females, the number in various age groups, 
and the like. In constructing the sample, members are drawn at random 
from the various strata. If the members are drawn such that the pro- 
portions in the various strata in the sample are the same as the propor- 
tions in those strata in the population, the sample is a proportional 
stratified sample. For example, a university may have 10,000 students of 
which 7,000 are males and 8,000 are females. A sample of 100 students is 
required. We may draw 70 males by a random method f*om the sub- 
population, or stratum, of males, and 80 females from the subpopulation, 
or stratum, of females. Such a sample is a proportional stratified sample. 

In much experimentation using human or animal subjects, the popu- 
lations from which the samples are drawn may not be amenable to precise 
definition, and the methods of sampling described above may have little 
relevance. For example, a laboratory experiment may use a sample of 
experimental animals. Can such a sample be viewed meaningfully as 
a random sample from a known population of animals? In a study of the 
therapeutic effects of different operative procedures applied to a certain 
class of brain tumor, all cases admitted to a particular hospital during a 
specified time period may be used. This number may be small. In 
what sense may this group of cases be viewed as a random sample drawn 
from a larger group or population? Again, in an educational experiment 
the pupils in two or three classes in a particular school and grade may be 
used as subjects. Can such a group of subjects be viewed as a random 
sample or its equivalent? Such samples as these are clearly not random. 
The method by which they are selected is not a random method. Despite 
this the investigator may wish to draw inferences that transcend the 
particular samples under study. He may wish to argue that his findings 
are probably true for some large, although perhaps ill-defined, population 
of experimental animals; or that the therapeutic effects of his operative 
procedures may be extended to all patients suffering from the particular 



sec. 9.3 


Sampling errors 


135 


class of brain tumor; or that the results of the educational experiment 
may be generalized to a much larger group of school pupils in the same 
grade 

In situations of the above type, where strict random sampling pro- 
cedures have not been used, does any basis exist for valid inference? 
A common practice is to investigate a posteriori a variety of characteris- 
tics of the sample. It may be possible to show that the sample does not 
differ appreciably in these characteristics from a larger group or popula- 
tion. Thus in the educational experiment the sample of students may 
be studied with respect to age, sex, IQ, socioeconomic level of the parents, 
and other characteristics. The sample may not differ much in these 
respects from a larger group or population in the same grade. Because 
the sample shows no bias on a number of known characteristics, that is, it 
may not differ from a random sample as far as these characteristics are 
concerned, the investigator may bo prepared to regard it as representative 
of the larger group or population and treat it as if it were a random sam- 
ple. Frequently precise knowledge is lacking about a larger reference 
group or population, and the investigator must rely on accumulated past 
experience and intuition in the attempt to detect possible bias. Clearly, 
where possible, random sampling is to be preferred to methods such as 
these. It must be recognized, however, that were we to insist on rigorous 
random sampling methods, much experimentation would not be possible. 

In practice, experiments are clearly not always conducted in the way 
our statistical preconceptions suggest they should be conducted. Experi- 
enced experimentalists are frequently aware of this. Much of the art of 
the experimentalist is concerned with reaching conclusions from data 
which do not satisfy some of the conditions necessary for rigorous inference. 


9*3 

Sampling errors 

Let us now consider the nature of the errors associated with particular 
sample values. What precisely is a sampling error? A sampling error is 
a difference between a population value, or parameter, and a particular 
sample value. Thus, if y is the population value of the mean and X, is 
an estimate based on a random sample of size N , then the difference 
y — X t = e t , where e x is a sampling error. Let us suppose that we know 
that the mean scholastic-aptitude test score for a population of 5,000 
university students is y = 562. A sample of 100 provides an estimated 
mean of = 566. The sampling error in this case is 

y — X x = 562 — 566 = -4 



136 


Sampling 


chap. 9 


Ordinarily, y is not known, and we are unable to specify e, exactly for any 
particular sample. Despite this, meaningful statements can be made 
about the magnitude of error which attaches to X % as an estimate of the 
parameter y. The reader should note that the concept of error in any 
context always implies a parametric, true, fixed, or standard value from 
which a given observed value may depart in greater or less degree. The 
idea that something in the nature of a parametric or true value can 
meaningfully be defined is essential to the concept of error. Without 
some appropriate definition of such a value, the concept of error has no 
meaning, and no theory of error is possible. Also no science is possible. 

How may the magnitude of error be estimated and described? 
Common sense suggests that in the measurement of any quantity some 
appreciation of the magnitude of error may be obtained by repeating the 
measurements a number of times, presumably under constant conditions, 
and observing how these repeated measurements vary from each other. 
Thus in the measurement of the length of a bar of metal a series of sepa- 
rate measurements may be made under constant conditions. Let us sup- 
pose that five such measurements are 55.95, 50.23, 56.25, 50.41, and 50.54 
in. In this case each measurement is an estimate of the same “true” 
length; hence the variation observed with repeated measurement is due 
to error. Let us suppose that five additional measurements are made 
using another measuring operation or procedure, these* measurements 
being 54.80, 55.31, 56.44, 56.52, 57.29 in. This latter set shows greater 
variation with repetition than the former. We may conclude that the 
magnitude of error associated with this latter set of measurements is 
greater. 

The above example is concerned with errors associated with par- 
ticular observations, namely, measurements of a bar of metal. In con- 
sidering the magnitude of error associated with, say, a sample mean A t , 
as an estimate of a population mean y , the situation is similar. The 
problem may be approached experimentally by considering how values of 
X t vary in repeated samples of size N . Thus the mean scores on the 
scholastic-aptitude test for five different samples of 100 students, drawn 
from a population of 5,000, may be 552, 558, 562, 568, and 569. These 
five sample means may be viewed as estimates of the same population 
mean y; that is, the value that would have been obtained were information 
available on all 5,000 members of the population. The variation 
of these five means one from another may be attributed to sampling 
error. 

In general, a number, say k , of samples of size N may be drawn at 
random from the same population and a mean calculated tor each sam- 
ple. These means may be represented by the symbols Xi> .AT 2 , Xi, . . . , 



•ec. 9.4 


Sampling distributions 


137 


J?jk. We may write 

n — Xi = ei 

n — X2 = e 2 

M ~ Xz = e% 

M Xk = 6fc 

The variance and the standard deviation are ordinarily used to describe 
the magnitude of variation in any set of observations or values. In 
describing the magnitude of variation in sample means With repeated 
sampling, the variance and standard deviation are also used. These 
statistics describe the magnitude of sampling error, that is, the magnitude 
of error associated with X , as an estimate of g. Note that the variance 
and standard deviation of sample means X t are the same as the variance 
and standard deviation of the sampling error e tJ because n is a constant. 


9-4 

Sampling distributions 

In the above discussion the problem of estimating error has been 
approached experimentally; that is, we considered the actual drawing of 
a number of samples and approached the experimental study of error 
through observed sample-to-sainple fluctuation. Consider for illustrative 
purposes a small finite population of eight members. Let the members of 
the population be cards numbered from 1 to 8. These cards may be 
shuffled, a sample of four cards drawn without replacement, and a mean 
calculated for the sample. This procedure may he repeated 100 times, 
and a frequency distribution made of the 100 sample means. This 
distribution is an experimental sampling distribution , and its standard 
deviation is a measure of the fluctuation in meant, from sample to sample. 

A theoretical , as distinct from an experimental, approach may be used. 
Given a finite population of eight members, a limited number of different 
samples of four cards exist. The number of such samples is the number 
of combinations of eight things taken four at a time, or C4 8 = 70. Each 
of these 70 samples may be considered equiprobable. The means for the 
70 possible samples may be ascertained and a frequency distribution pre- 
pared. This frequency distribution is a theoretical sampling distribution . 
It is obtained by direct reference to probability considerations. No 
drawing of actual samples is involved. The standard deviation of the 
theoretical sampling distribution is a measure of fluctuation in means 
from sample to sample. 

In the above example the population is small and finite. In practice, 



138 


Sampling 


chap. 9 


most of the populations with which we deal are indefinitely large, or if 
finite, they are so large that for all practical purposes they can be con- 
sidered indefinitely large. In the study of sampling error the approach 
used in dealing with an indefinitely large population is a simple extension 
of that used with a small finite population. The distinction between an 
experimental and theoretical sampling distribution still applies. When 
the population is indefinitely large, the theoretical sampling distribution 
of, for example, the mean is the frequency distribution of means of the 
indefinitely large number of samples of size V which theoretically could 
be drawn. 

The theoretical sampling distributions are known for all commonly 
used statistics. The standard deviation of the sampling distribution is 
called the standard error . Thus a standard error is always a standard 
deviation which describes the variability of a statistic over repeated 
sampling. The standard deviation of a theoretical sampling distribution 
is, in effect, a population parameter. It is descriptive of the variation of 
a statistic in a complete population of sample values. The standard 
deviation of the theoretical sampling distribution of the mean is repre- 
sented by the symbol cr 2 . In practice this standard deviation must be 
estimated from sample data. This estimate in the case of t\we mean may 
be represented by the symbol Si. For most statistics fairly simple for- 
mulas are available for estimating the standard deviation of the theo- 
retical sampling distribution. 

The theoretical sampling distributions of some statistics are normal, 
or approximately so; others are not. For example, the theoretical sam- 
pling distribution of the mean X is normally distributed in sampling from 
an indefinitely large normally distributed population. The sampling 
distribution of the correlation coefficient presents a complicated problem. 
It is not normally distributed except under certain special circumstances. 
When the shape of the sampling distribution is known, certain kinds of 
statements can be made about a population value from a sample estimate. 
For example, it is possible to fix limits above and below a sample value and 
assert with a known degree of confidence that the population parameter 
falls within those limits. The fixing of such limits requires a knowledge 
of the shape of the sampling distribution. 


9-5 

Sampling distribution of means from a 
finite population 

In practice, most samples are viewed as drawn from indefinitely large 
populations. The essential ideas of sampling may, however, be con- 



sec. 9.5 


Sampling distribution of means from a finite population 139 


veniently illustrated with reference to a small finite population. Sup- 
pose, as mentioned above, that we have a population of eight cards 
numbered from 1 to 8. These cards may be shuffled, and a sample of 
four cards drawn at random. After each card is drawn it is not returned; 
that is, the sampling is without replacement. A mean X may be calcu- 
lated for this sample. The four cards may now be returned, the eight 
cards shuffled, another sample of four cards drawn, and another mean 
calculated. Let us continue this procedure until 100 samples of four 
cards have been drawn and their means calculated. Table 9.1, col. 3, 
shows the frequency distribution of 100 such sample means. This dis- 
tribution is an experimental sampling distribution of means. It shows 
experimentally how the means of samples of four drawn at random with- 
out replacement from a population of eight vary from sample to sample. 
The mean of the experimental sampling distribution Xj, that is, the mean 
of the 100 means based on samples of four, is found to be 4.56. The mean 
of the population from which the samples have been drawn is the mean of 
the integers from 1 to 8 and is 4.50. The standard deviation of the 100 
means s 2 is found to be .834. 

The investigation of the fluctuation in sample means may be ap- 
proached theoretically. The number of different samples of four in 
sampling without replacement from a population of eight members is the 
number of combinations of eight things taken four at a time, or C 4 8 = 70. 
These 70 samples may be considered equiprobable. A listing of the 70 
samples may readily be made, and the means calculated. The sample 
with the smallest mean will be 1, 2, 3, 4; the mean X = 2.50. The sam- 
ple with the largest mean will be 5, 6, 7, 8; here X - 6.50. Thus X will 
range from 2.50 to 6 50 Table 9.1, col. 5, shows the frequency distribu- 
tion of the 70 sample means. This distribution is a theoretical sampling 
distribution of the mean of samples of four from a small finite population 
of eight members. It is based on the idea that there are 70 possible 
combinations of eight things taken four at a time, all combinations being 
equiprobable. 

The mean of the theoretical sampling distribution may be calculated. 
This mean /u? is found to be 4.50. The standard deviation is found 
to be .866. These values do not differ markedly from the mean and 
standard deviation of the experimental sampling distribution, these being 
Xz = 4.56 and s 2 = .834. Presumably, had a larger number of samples 
been drawn, say, 200 or 1,000, the experimental sampling distribution 
would be observed to approximate more closely to the theoretical 
distribution. 

The mean and standard deviation of the theoretical sampling dis- 
tribution of Table 9. 1 were oalculated directly from the 70 possible sample 



140 


Sampling 


chap 9 


means These \alues mav however he readily obtained without using 
this time-consuming method It can he shown that the mean of the 
theoretical sampling distribution is equal to the population mean that is, 
m ■= n In oui example the mean of the sampling distribution of sam- 
ples of four fiom a population of eight members is observed to be 4 30 
Likewise the population mean, that is the mean of the integers from 1 to 
8, is also 4 30 1 he standard deviation of the theoretical sampling dis- 

tubution is gnen b\ the formula 


(7 jN p - N 
ai VnVn p - i 


9 i 


Table 9 1 

Experimental and theoretical sampling distributions of means 
of samples of four drawn from a population of eight members 




1 xpt rime nt il 


I be orr tu ll 



A 

distribution 


distribution 



/ 

V 

/ 

P 

I 

2 

3 

4 

5 

6 

10 

2 30 

1 

010 

1 

014 

11 

2 73 

2 

020 

1 

014 

12 

1 00 

0 

000 

2 

029 

n 

1 23 

3 

050 

3 

04 3 

14 

1 30 

7 

070 

5 

071 

13 

3 73 

7 

070 

5 

071 

If) 

4 00 

S 

oso 

7 

100 

17 

4 23 

11 

110 

7 

100 

IS 

4 30 

13 

HO 

8 

114 

10 

4 75 

10 

100 

7 

100 

20 

5 00 

10 

100 

7 

100 

21 

3 23 

0 

090 

3 

071 

22 

3 30 

7 

070 

3 

071 

21 

3 75 

4 

040 


043 

24 

b 00 

1 

030 

2 

029 

23 

6 23 

1 

010 

1 

014 

26 

6 30 

2 

020 

1 

014 

Total 


100 1 

000 

70 

998 


sec. 9.6 


Sampling distribution of means from a large population 


141 


where <r = standard deviation in population 
N p = number of members in population 
N = sample size 

In our example a is the standard deviation of the integers from 1 to 8 and 
is equal to 2.29. Population and sample size are, respectively, 8 and 4. 
Hence 


O'? 


2^29 /8 — 4 

y /\ \8 - f 


- .8f)6 


If, then, the standard deviation a of the population is known, wc can 
readily obtain from the above formula the standard deviation of the 
theoretical sampling distribution and use this as a measure of fluctuation 
in means from sample to sample. 

A knowledge of the standard deviation of a theoretical sampling 
distribution is of limited usefulness unless additional Information is availa- 
ble on the shape of the distribution. In certain instances sampling dis- 
tributions are normal, or approximately normal in form. The theoretical 
sampling distribution of Table 9.1 departs appreciably from the normal 
form. If, however, both sample and population size were increased, the 
distribution would approximate more closely to the normal form. For 
N = 110 and N p = 100, the normal distribution would be a good approxi- 
mate fit. If the sampling distribution is approximately normal, we can, 
given its standard deviation, readily estimate the probability of obtaining 
values equal to or greater than any given size in random sampling from 
the population. 


9.6 

Sampling distribution of means from an 
indefinitely large population 

Many populations may be conceptualized as comprised of an indefinitely 
large number of members. Most applications of sampling theory encoun- 
tered in psychology and education assume such populations. Sampling 
from an indefinitely large population is essentially the same as sampling 
from a finite population with replacement, that is, where each sample 
member is returned to the population prior to the drawing of the next 
member. In sampling without replacement from an indefinitely large 
population the probabilities remain unchanged regardlesvs of the size of 
the sample drawn. Similarly, in sampling from a finite population with 
replacement, the population is not depleted and the probabilities are 
unchanged by the number of prior draws. It follows that problems of 



142 


Sampling 


chap. 9 


sampling from an indefinitely large population can be approached through 
the study of finite populations where samples are drawn with replacement. 

To illustrate sampling from an indefinitely large population, an arti- 
ficial population was constructed. This population was comprised of 
1,611 cards containing the numbers from 1 to 25. The distribution of 
numbers is approximately normally distributed. The distribution of this 
population is shown in Table 9.2. The mean n of the population is 13, 
and the standard deviation a is 3.56. The cards were inserted in a box, 
and samples of 10 cards drawn with replacement; that is, a card was 
drawn, its number noted, and the card then returned to the box before 
the next draw. Altogether 100 samples of 10 cards were drawn, and the 
100 means calculated. Table 9.3 shows the means of the samples. 
Table 9.4, col. 2, shows a frequency distribution of these means. This 
distribution is an experimental sampling distribution of means based on 
samples of size 10. The mean of the sampling distribution Xj is 13.205, 
and the standard deviation is 1.139. This standard deviation is a 
description based on experimental data of the sample-to-sample fluctua- 
tion of means of samples of size 10 drawn at random from this population. 

The mean and standard deviation of the sampling distribution need 


Table 9.2* 

Population from which samples were drawn: frequency 
distribution of numbers 


Number 

Frequency 

Number 

Frequency 

J 

1 

14 

174 

2 

2 

15 

154 

3 

4 

16 

127 

4 

7 

17 

96 

5 

14 

18 

67 

6 

26 

19 

43 

7 

43 

20 

26 

8 

67 

21 

14 

9 

96 

22 

7 

10 

127 

23 

4 

11 

154 

24 

2 

12 

174 

2o 

1 

13 

181 




Total 1,611 



sec. 9.6 


Sampling distribution of means from a large population 


143 


not be estimated by the rather laborious experimental approach described 
above. It can be shown that the mean of the theoretical sampling dis- 
tribution of the mean in sampling from an indefinitely large population is 
equal to the population mean; that is, m = p. It can also be shown that 
the standard deviation of the sampling distribution is given by 


G 



where a = standard deviation in population 
N = size of sample 


9.2 


Table 9.3 

Means of samples of 10 drawn from the population in table 9.2 


10 9 

13 5 

11 7 

13 3 

13 8 

12 5 

15 0 

1? 7 

14 3 

12 7 

13 0 

13 2 

14 0 

13 1 

13 2 

12 7 

12 6 

11 5 

13 2 

12 9 

12 4 

13 9 

14 1 

12 2 

13 1 

11 7 

11 5 

14 0 

12 6 

12 9 

13 9 

14 0 

11 7 

12 1 

13 2 

13 6 

14 A 

14 0 

12 2 

13 7 

12 6 

11 6 

11 8 

12 1 

13 l 

13 2 

12 5 

U 0 

16 4 

12 2 

12 6 

13 7 

13 6 

14 0 

12 1 

13 2 

14 8 

13 6 

12 5 

14 5 

14 4 

13 9 

13 8 

15 1 

14 2 

14 4 

13 5 

12 7 

14 5 

14 4 

12 9 

11 3 

14 5 

13 0 

12 0 

13 3 

12 7 

14 8 

II 3 

11 0 

12 7 

14 6 

15 2 

14 1 

16 1 

14 7 

12 3 

11 2 

14 3 

14 7 

12 9 

12 3 

11 9 

14 0 

14 5 

12 4 

11 9 

12 3 

12 4 

12 6 


Table 9.4 

Experimental and theoretical sampling distribution of 100 
sample means for samples of size 10 drawn from the popu- 
lation of table 9.2 


Class interval 

Frequency 


Fxperimental 

Theoretical 

16 5-17 4 

— 

1 

15 5-16 4 

2 

1.4 

14 5-15 4 

13 

8 4 

13 5-14 4 

27 

24 6 

12 5-13 4 

31 

34 3 

11 5-12 4 

22 

22.8 

10 5-11 4 

5 

7.2 

9 5-10 4 

-- 

1.1 

Total 

100 

99 9 




144 


Sampling 


chap. 9 


The reader will observe that the difference between this formula and the 
formula previously given for the standard deviation of the sampling dis- 
tribution for the means of samples from a finite population resides in the 
absence here of the term \/ (N p — N)/(N P — 1). As N p increases, this 
term approaches 1 as a limit. It is equal to 1 when the population is 
indefinitely large. The standard deviation of the theoretical sampling 
distribution is a- r - y/\{) -- 1. 12b. This is very close to the stand- 

ard deviation of the experimental sampling distribution s 7} which was 
found to be 1.139. 

The theoretical sampling distribution of the means of samples drawn 
from a normal population is normal. Thus if we know that the popula- 
tion distribution is normal, we know that tfie sampling distribution of 
means is normal. Kcgardless of the shape of the population distribution, 
the sampling distribution of means will approximate the normal form as 
N increases in size. For practical purposes the distribution may be taken 
as approximately normal for samples of reasonable size, except in the case 
of fairly gross departures of the population from normality. The theo- 
retical normal frequencies have been calculated for our illustrative exam- 
ple. These theoretical frequencies are shown in 'fable 9.4, col. 3. These 
are the expected normal frequencies for a normal curve witl^a mean of 
13.00 and a standard deviation of 1.139. The differences between the 
experimental and the theoretical normal distribution are not very great. 

Examination of the formula •= a *'y/N indicates that the standard 
error of the mean is directly related to the standard deviation of the popu- 
lation and inversely related to the size of sample. Thus the greater the 
variation of the variable in the population, the greater the standard error; 
also the larger the size of N, the smaller the standard error. The stand- 
ard error of means of samples of AT - 1, the smallest sample size possible, 
is equal to the population standard deviation. For any fixed value of 
cr the standard error can be made as small as we like by increasing the size 
of the sample. 


9-7 

Sampling distribution of proportions 

Many problems require the use of proportions. A study of the sampling 
distribution of a proportion may be approached either experimentally or 
theoretically. To illustrate, consider an urn containing a Unite number, 
N p , of black and white chips. Denote the proportion of black and white 
chips by 0 and 1 — 0, respectively. Let us draw a large number of sam- 
ples of size N at random, without replacement, from the urn, observe 
the proportion of black chips in each sample, and make a frequency dis- 



sec. 9.7 


Sampling distribution of proportions 


145 


tribution of these proportions. This frequency distribution is an experi- 
mental sampling distribution of proportions for samples of size N. As 
with the arithmetic mean, we may use a theoretical, as distinct from an 
experimental, approach. In drawing samples of N from a finite popula- 
tion of N p members, the number of different equiprobable samples is 
Cn n >. The proportion of black chips may be calculated from each of 
these samples, and a frequency distribution made of the proportions. 
This distribution is a theoretical sampling distribution of proportions. 

For illustrative purposes consider a hypothetical population of three 
black and three white chips. Denote the members of this population by 
B 1, /? 2 , Bz, W 4, Why W r 6 , the subscript identifying the particular population 
member. We may consider the set of equiprobable samples of three 
members which may be drawn without replacement from this population. 
The number of such samples is - 20. The first sample is B\B 2 B^ 
and the proportion of black chips is p — \ 00. The second sample is 
B\B 2 W a with p = .07; the third, BilitW* with p = .07, and so on. A 
frequency distribution may be made of these proportions, and is as 
follows: 


p 

/ 

f/N 

0 

1 

05 

.33 

9 

.45 

.07 

9 

.15 

1 00 

1 

or. 

Total 

20 

1 00 


The standard deviation, or standard error, of this theoretical sampling 
distribution may be readily calculated ami is a v =- .224. This standard 
error may be obtained directly by the formula 


<r P = 



N p - , 
N p - 1 


0.3 


In the above example d = .50 since three of the six chip. 1 are black. 
Population and sample size respectively are N p — 0 and N - 3. Hence 


°p - 



X/i 

3 


0 - 3 
0 - 


.224 


which agrees with the value obtained by direct calculation. 

The discussion above relates to samphng without replacement from 
a finite population. As with the arithmetic mean, the term ( N p — N)/ 
(N p - 1) in formula (9.3) approaches unity for any finite value of N as N p 
approaches infinity. Thus this term may be considered equal to unity 



146 


Sampling 


chap. 9 


for an indefinitely large population, and we obtain 

IWEJ) 

= yj—N— 94 

as the standard error of a proportion in sampling from an indefinitely large 
population, or in sampling from a finite population with replacement. 

Formula (9 4) may be obtained by reference to the binomial. If the 
proportion of black chips in a population is 0, the expected, or theoretical, 
sampling distribution of the number of black chips in samples of size N , as 
distinct from the proportion of black chips, is given by the terms of the 
binomial [0 + (1 — 0)]^. The mean and standard deviation of this dis- 
tribution are AT0 and \/NB(l — 0), respectively. We are interested in 
the distribution of the proportion, instead of the number, of black chips 
in the samples. To obtain the standard deviation of the distribution of 
the proportion, as distinct from the number, of black chips in samples of 
size AT, we multiply \/N0(l — 0) by l/N to obtain <r p = \/d(l — d)~/N 
which is the same as formula (9.4) above. To illustrate, let 0 = .25 and 
1 — 0 = .75. The expected distribution of the number of black chips in 
samples of size 10 is given by expanding the binomial (.25 + .75) 10 . 
The mean in this example is 10 X .25 = 2.5, and the standariidoviation is 
VTO X .25 X .75 = 1.37. The standard deviation of the distribution of 
the 'proportion of black chips in samples of size 10 is obtained by dividing 
1.37 by 10 and is .137. 

Formulas (9.3) and (9.4) assume that 0 is known. In practice, 0 is 
usually not known and the sample value p is used as an estimate of 0. 

9.8 

Sampling distribution of differences 

For certain purposes a knowledge of the sampling distribution of the 
difference between two statistics, such as the difference between two 
arithmetic means or two proportions, is required. To conceptualize the 
sampling distribution of the difference between, say, two arithmetic 
means, let us consider two indefinitely large populations whose means 
are equal; that is, mi = g 2 . Let X\ be the mean of a sample of N 1 cases 
drawn at random from the first population and X 2 be the mean of a sam- 
ple of N 2 cases drawn from the second population. The difference 
between means is X\ — X 2 . Since mi = M 2 this difference results from 
sampling error. A large number of pairs of samples may be drawn, and 
a frequency distribution made of the differences. It describes how the 
differences between means chosen at random from two populations, 
where mi = M 2 , will vary with repeated sampling. From this distribution 



sec. 9.8 


Sampling distribution of differences 


147 


we may estimate the probability of obtaining a difference of any specified 
size in drawing samples at random from populations where m = ju 2 . By 
considering an indefinitely large number of pairs of samples we arrive at 
the concept of a theoretical sampling distribution of differences between 
sample means. In this situation the individual measurements in the two 
populations are not paired with one another. The samples are inde- 
pendent. The means may be viewed as paired at random No correla- 
tion exists between the pairs of means. 

The variance of the sampling distribution of differences describes how 
the differences vary with repeated sampling. Consider the case of inde- 
pendent samples. If a ? 2 — a\ 2 /Ni is the variance of the sampling dis- 
tribution of means drawn from one population and <77/ = *2 2/ ATo is the 
corresponding variance from the other population, then the variance of 
the sampling distribution of differences between naans is the sum of the 
two variance* Thus 




err a 2 2 

N'x ^ N\ 


9 5 


When <ti 2 = a-i~ — a-, the variances in the two populations being equal, 
we may write 



Consider now' a situation where* measurements are paired with one 
another. Such data arise, for example, where measurements are made on 
the same group of subjects under both control and experimental con- 
ditions. The paired measurements may be correlated. In this instance, 
in approaching the sampling distribution of differences between means, 
we conceptualize two populations of paired measurements with ecpial 
means; thus pi - p 2 . Denote the correlation between the paired meas- 
urements by the symbol pi 2 . Samples of size N are drawn at random, and 
the differences between means obtained. The distribution of differences 
between means for an indefinitely large number of samples is the sampling 
distribution of differences for correlated populations. 

For correlated populations the variance of the sampling distribution 
of differences may he shown to he 

<r ?l j 2 ~ <r 2l 2 + o’?/ 2 — 2pi2<r 5l <r 2j 9.7 

where P12 is the correlation in the population. Note that the formula for 
independent samples is a particular case of the more general formula for 
correlated samples. It is the particular case which arises when pn = 0. 
In the correlated case Ni => #2 = AT. 



148 


Sampling 


chap. 9 


Formulas (9.5) to (9.7) are simple applications of the formula for 
the variance of differences. 


EXERCISES 

1 Indicate the difference between (a) a random sample and a stratified 
random sample, (b) a stratified random sample and a proportional 
stratified sample, (c) an experimental and a theoretical sampling 
distribution. 

2 How would you proceed to draw a random sample of 100 university 
students? 

3 How would you proceed to draw a systematic stratified sample of 100 
students from the students in a university? 

4 Would a random sample of names selected from a telephone book be 
considered appropriate for the study of voting behavior? 

5 Samples of three cards are drawn at random from a population of eight 
cards numbered from 1 to 8. Obtain (a) the theoretical sampling 
distribution of means, (b) the standard deviation of the sampling dis- 
tribution, (c) the probability of obtaining a mean equal to or greater 
than 7. 

6 The standard deviation of the sampling distribution of X is <r/y/N . 
What is the standard deviation of the sampling distribution of NX 
or 2X? 

7 A university has a population of 1,000 students. The standard devia- 
tion of scholastic-aptitude test scores in this population is 80. Calcu- 
late the standard errors of mean scholastic-aptitude test scores for a 
sample of 100 students drawn with and without replacement. 

8 The pages of this book, excluding the Appendix, may be defined as a 
population. Some pages contain no formulas, some one, some tw T o, 
and so on. (a) Obtain the population distribution of the number of 
formulas per page; that is, ascertain the number of pages with no 
formulas, the number with one, with two, and so on. Count only 
numbered formulas. (6) Calculate the mean and standard deviation 
for the distribution of (a) above, (c) Draw without replacement 
five random samples of 30 pages each, using an appropriate random 
sampling method. Calculate for each sample the mean number of 
formulas per page and the standard deviation of means for the five 
samples. How does this standard deviation compare with that 



Exercises for chapter 9 


149 


obtained from formula (9.1)? Repeat (r) using a random sample 
drawn with replacement, and compare the resulting standard devia- 
tion with that obtained from formula (9.2). 

9 The variance of the sampling distribution of the mean for samples of 
100 cases drawn from an indefinitely large population is 20. How 
large should the samples be to reduce this variance by one half? 
How laige should the samples be to reduce the standard deviation of 
the distribution by one half? 

10 A population consists of six black and two white chips. Obtain the 
frequency distribution of proportions of black chips in samples of four 
drawn from this population. What is the standard deviation of this 
distribution? 

u Will a negative correlation between paired observations increase or 
decrease the standard error of the difference between two means? 



Estimation 


10 


i Introduction 


This chapter considers sonic aspects of the problem of estimating popula- 
tion parameters from sample values. A distinction is commonly made 
between two types of estimates, point estimates and interval estimates. A 
point estimate is the value obtained by direct calculation on the sample 
values. If in a particular sample the mean X = 26.88, this is a point 
estimate of the parameter g. Another approach is to specify an interval 
within which we may assert with some known degree of confidence that 
the population mean lies. Thus, for example, instead of the point estimate 
X we may perform a simple calculation, which will shortly be described, 
and assert with 95 per cent confidence that the population mean falls 
within the limits 24.92 and 28.84. These values are called confidence 
limits , and the interval they contain is called a confidence interval . Such 
an interval is an interval estimate. 


10.2 

Properties of estimates 

Methods of estimation are sometimes said to yield unbiased , consistent , 
efficient , and sufficient estimates. These are desirable characteristics and 
serve as criteria for preferring one method of estimation to another. 

A method of estimation provides an unbiased estimate when the mean 
of a large number of sample values, obtained by repeated sampling, 
approaches the population value in the limit as the number of samples 
increases. This simply means that a statistic is unbiased when it displays 
no systematic tendency to be either greater than or less than the popula- 
tion parameter; that is, it is not subject to a constant erior. The arith- 
metic mean is an unbiased estimate. The sample mean X exhibits no 
systematic tendency to be either greater than or less than the parameter p. 
Stated in somewhat different language, an estimate is unbiased when its 
expected value is equal to the parameter it purports to estimate. The 



sec. 10.3 


Confidence intervals for means of large samples 


151 


expected value of a statistic is the value we should expect to obtain upon 
averaging the values of the statistic over an indefinitely large number of 
repeated random samples. It is the mean of the theoretical sampling 
distribution. The expected value of the mean, denoted by E(X), is the 
population mean g. A statistic is a biased estimate when the mean of 
repeated sample estimates does not tend toward the population value, but 
departs in a systematic fashion from it. The variance obtained by the 
formula X(X — X) 2 /N is a biased estimate of the parameter a 2 . 

A method of estimation is said to yield a consistent estimate if that 
estimate approaches the population parameter more closely as sample size 
increases. The arithmetic mean is a consistent estimate in that it tends 
to draw closer to the population parameter with increase in sample size. 

The efficiency of a method of estimation is related to its sampling 
variance. The relative efficiency of two methods of estimation the 
ratio of the two sampling variances, the larger variance being placed in 
the denominator. To illustrate, when the distribution of a variable in 
the population is symmetrical and unimodal, both the mean and the 
median are estimates of the same population parameter, g. The sam- 
pling variance of the mean is — a 2 /N, and of the median, 


1.57V 

“AT ' 


The relative efficiency is then 

Relative efficiency — 


**/N 
1.57*7 AT 


.04 


10. 1 


Thus for this type of population the mean is more efficient than the 
median. Relative efficiency has meaning in terms of sample size. A 
median calculated on a sample of 100 cases has a sampling variance equal 
to that of a mean calculated on 64 cases. The mean is a more economical 
estimate and the saving achieved by its use m preference to the median is 
36 cases in 100. 

A method of estimation is sufficient if it is more efficient than any 
other possible method of estimation, that is, if its sampling variance is 
less. A sufficient method of estimation uses all the information in the 
sample. In this context, the concept of information is assigned a precise 
mathematical meaning. 


10.3 

Confidence intervals for means of large samples 

The calculation of confidence intervals for a mean based on a large sample 
is a relatively simple procedure. The sampling variance of the sampling 



15 2 


Estimation 


chap. 10 


distribution of the moan, as previously stated, is a-* 2 -<r 2 'N. The 
population variance rr 2 is unknown. If we use an unbiased variance .s 2 
as an estimate of <r 2 , our estimate of the sampling variance is s? 2 = s 2 /N 
and the estimated standard error is given bv 


, 9 5 — - _ 10.2 

Vn 

This i>s the commonly used formula for estimating the standard error of 
the arithmetic mean. 

(Vm.sider now the ratio : - (A — n) s? This ratio is a deviation of 
a sample mean from its population mean, divided by an estimate of the 
standard deviation of the sampling distribution. It is a standard score. 
Assuming tin* normality of z , it i^. correct to state that the probability is 
.95 that the following statement is true: 

- 1 .!)<> < A “ M < 1 .9(5 10.3 

This inequality specifies tlie confidence interval in standard-score form 
In effect it states that the chances are Do m 100 that (A^~ ju) '«=? falls 
between fj.90. Ordinarily, however, we aie not iuteiested in the 
confidence interval in standard-score form, but m raw-score form. To 
convert the inequality to raw-score form, we multiply by s? and add A' to 
each term to obtain 

.V ~ \M\Sf < M < X + 1.90Sy 10.4 

This states tliat the chances are 95 in 100 that g fails between A' t 1.9G.S*?; 
thus the upper limit is ].9(> standard error units above the sample mean, 
and the lower limit is 1.00 standard error units below' the mean. The 
figure 1 .00 derives, as the reader will recall, from the fact that 95 per cent 
of the area of the normal curve falls within the limits ±1.90 standard 
deviation units from the mean. To illustrate the fixing of confidence 
intervals, let the mean IQ of a random sample of 100 secondary school 
children be 114 and the standard deviation 17. The standard deviation 
here is the square root of the unbiased variance estimate. Our estimate 
of the standard error of the mean is — 17/\/l00 = 1.70. The 95 
per cent confidence interval is then given by 114 ± 1.90 X 1.70. The 
upper limit is 1 17.33 and the low T er limit is 110.07. Thu* we may assert 
with 95 per cent confidence that the population mean falls within these 
limits. The 99 per cent confidence limits are given by X ± 2.58*2. 
The figure 2.58 derives from the fact that 99 per cent of the area of the 
normal curve falls within the limits ±2.58 standard deviation units above 
and below the mean. In the above example the 99 per cent confidence 



sec. 10.4 


The distribution of t 


153 


limits are given by 114 4 2.58 X 1.70. These limits are 109. 61 and 
118.39. 

What meaning attaches to the statement that we are 95 per cent 
confident that the actual population mean falls within certain specific 
limits? A particular sample may have a mean X - 20.88 with 95 per 
cent confidence intervals 24.92 and 28.84. Another sample of the same 
size may have a mean X — 25.08 with 95 per cent confidence intervals 
23.72 and 27.04. Presumably we could draw a large number of samples, 
obtain a large number of upper and lower limits, and prepare frequency 
distributions of these upper and lower limits. These two distributions 
would be experimental sampling distributions lor the 95 per cent con- 
fidence limits Without elaborating the details of this situation, we state 
that about 95 per cent of the intervals so obtained would include the 
population mean and about 5 per cent of the intervals would not include 
the population mean. Thus the statement that we are 95 per cent con- 
fident implies that we expect about 95 per cent of our assertions to be 
correct and the icmaining 5 per cent to be incorrect, or that the odds are 
19:1 that the confidence interval includes the population value. The 
use of a 95 per cent confidence interval is fairly common. If a greater 
degree of confidence is desired, a 99 per cent interval may be used. This 
interval is, very roughly, 1.3 times as great as the 95 per cent interval. 
Thus as we increase our level of confidence, the interval is increased. 
Likewise, of course, as we decrease the level of confidence, the interval is 
decreased. Any desired level of confidence can be obtained by varying 
the size of the confidence interval. As the confidence level is decreased 
and approaches zero, the confidence interval approaches zero as a limit. 
As the confidence level is increased and approaches 100, the confidence 
interval approaches infinity as a limit. In practice, 95 and 99 per cent 
confidence intervals are widely used. 

Implicit in the above discussion is the assumption that the ratio 
(X — m)/s; is normally distributed. This ratio is not normally dis- 
tributed when A r is small, but approaches the normal form as N increases 
in size. It is a not uncommon statistical convention to consider a sample 
of 30 or more observations as large and a sample of less than 30 as .nnall. 
This, of course, is highly arbitrary. 

10.4 

The distribution of t 

In drawing samples from a normal population with mean 4 and variance 
a 2 , the distribution of the ratio 

X-M 



154 


Estimation 


chap, xo 



fig. io. i Distribution of t for various degrees of freedom. 


is normal. This ratio is in standard-score form with zero mean and unit 
standard deviation. It is a deviation of a sample mean from a population 
mean, divided by the standard deviation of the sampling distribution of 
means. Where a 2 is unknown, we estimate it from the sample data, using 
in this instance an unbiased estimate. We obtain thereby an estimate of 
as. Denote this by Sj. We may now consider the ratio 


X — p _ Jt - M 

ls(x -\xy 

S~N{N - 1 ) 


10.5 


This ratio contains the variable sample values X and sj in the numerator 
and denominator, respectively. This is a t ratio. It departs appreciably 
from the normal form for small N. Its theoretical sampling distribution 
is called the distribution of t. If samples of, say, f> or 10 members are 
drawn from a normal population, a value of / calculated for each sample, 
and a frequency distribution of the different values of t prepared, the 
resulting distribution will not be normally distributed. It will be sym- 
metrical but leptokurtic. The theoretical sampling distribution of t for 
small N is also symmetrical and leptokurtic. It tapers off to infinity at 
the two extremities. It is, however, thicker at the extremities than the 
corresponding normal curve. A different t distribution exists for each 
number of degrees of freedom. As the number of degrees of freedom 
increases, the t distribution approaches the normal form. Figure 10.1 



sec. 10.4 


The distribution of t 


155 


compares the normal distribution with the distribution of t for various 
degrees of freedom. 

Hitherto we have considered two theoretical model frequency dis- 
tributions, the binomial distribution and the normal distribution. The 
t distribution is a third theoretical model distribution with wide applica- 
tion to many sampling problems. It was developed originally in 1908 by 
W. S. Gosset who wrote under the pen name “Student.” 

In sampling problems the t distribution is used in a manner directly 
analogous to the normal distribution. In the normal distribution 95 per 
cent of the total area under the curve falls within plus and minus 1.96 
standard deviation units from the mean and 5 per cent of the area falls 
outside these limits. Likewise, 99 per cent of the area under the normal 
curve falls within plus and minus 2.58 standard deviation units from the 
mean and 1 per cent of the area falls outside these limits. In the t dis- 
tribution, the distances along the base line of the curve that include 
95 per cent and 99 per cent of the total area are different for different 
numbers of degrees of freedom. It is customary in tabulating areas under 
the t curve to use degrees of freedom, df, instead of N. While the df 
associated with the sample variance is N — l f the df associated with other 
statistics may be N — 2, N — 3, and the like. Consequently, tables of 
t by degrees of freedom instead of N are more generally applicable. The 
distances from the mean, measured along the base line of the t distribu- 
tion, that include 95 per cent and 99 per cent of the total area (analogous 
to the 1.96 and 2.58 of the normal distribution) for selected degrees of 
freedom are as follows: 


df 

95% 

99% 

1 

12.71 

63 66 

2 

4 30 

9 93 

3 

3 18 

5 84 

4 

2 78 

4 60 

5 

2.57 

4 03 

10 

2.23 

3.17 

15 

2.13 

2.95 

20 

2 09 

2 85 

30 

2 04 

2.75 

120 

1 98 

2 62 

00 

1.96 

2.58 


Note that as the number of degrees of freedom approaches infinity, t 
approaches the values 1.96 and 2.58. The difference between t for about 



Estimation 


chap, io 


IS6 


30 degrees of freedom and t for an indefinitely large number of degrees of 
freedom is sometimes interpreted for practical purposes as trivial. A 
more complete tabulation of t is given in Table B of the Appendix. A 
distinction is often made between la rye and small sample statistics. This 
distinction resides in the fact that the normal distribution is frequently 
found to be an appropriate model for use with sampling problems involv- 
ing large sampler With small samples the distribution of / provides for 
many statistics a more appropriate model. 

10.5 

Degrees of freedom 

In the above discussion on the distribution of /, mention is made of the 
number of degrees of freedom. This concept was discussed also in ('hap. 
4, where the sample variance was defined as the sum of squares of devia- 
tions about the arithmetic mean, divided by the number of degrees of 
freedom. The degree of freedom concept requires further elaboration. 

As stated, and illustrated in ('hap. 1, the number of degrees of free- 
dom is the number of values of the variable that, are free to vary. The 
measurements 10, 14, f>, .">, and 0 , when represented as deviations from a 
mean of 8, become 4 2, —2, —3, —3. The sum of these deviations 

is zero. In consequence, if any four deviations are known, the remaining 
deviation is determined. The number of degrees of freedom is 4. 

This type of situation may be represented in symbolic form. lid .Yi, 
X 2 , Xa be three measurements with mean X. The sum of deviations is 
(X! - X) + (X 2 - X) + (Xi - A) = 0. If X and any two of the 
values of A" are known, the third value of A" is determined. The number 
of degrees of freedom here is 2. The calculation of the variance and 
standard deviation requires the sum of squares of deviations about the 
mean, 2(X — X) 2 . N ~ 1 of the values of which this sum of squares is 
comprised are free to vary independently. The number of degrees of 
freedom associated with the sum of squares is N — 1. Dividing this sum 
of squares by the number of degrees of freedom associated with it, as 
distinct from the number of observations, yields an unbiased estimate of 
the population variance a 2 . The symbol <IJ is frequently used to represent 
degrees of freedom. 

The number of degrees of freedom depends on the nature of the prob- 
lem. In fitting a lino to a series of points by the method ot least squares 
the number of degrees of freedom associated with the sum of squares of 
deviations about the line is N — 2. If there are two points only, a 
straight line will fit the points exactly and the sum of squares of deviations 
about the line will, of course, be zero. No freedom of variation is pos- 



sec. 10.6 


Confidence intervals of means for small samples 


157 


siblc. With three points dj = 1 ; with 15 points, df = 13. The equation 
of a straight line is 


V =- bX + a 


where b is the slope of the line ami a is the point where it cuts the V axis. 
Roth b and a are estimated from the data. It may he said that 2 degrees 
of freedom are lost in estimating b and a from the data. If b, a, and any 
iV — 2 deviations from the line are known, the remaining two deviations 
are determined. 

The concept of degrees of freedom has a geometric interpretation. 
A point on a line is free to move in one dimension only and has l degree of 
freedom. A point on a plane has freedom of movement in two dimensions 
and has 2 degrees of freedom. A point in a space of three dimensions has 
3 degrees of freedom. Likewise, a point in a space of k dimensions has 
k degrees of freedom. It lias freedom of movement in k dimensions. 

The concept of degrees of freedom is widely used in statistical work 
and will he discussed subsequently in connection with contingency tables 
and the analysis of variance. The essence of the idea i^ simple. The 
number of degrees of freedom is always the number of values that are 
free to vary, given the number of restrictions imposed upon the data. It 
seems intuitively obvious that in the study of variation we should concern 
ourselves with the number of values that enjoy freedom to vary within the 
restrictions of the problem situation. 


10.6 

Confidence intervals of means for small samples 

The line of reasoning used in determining confidence intervals for small 
samples is similar to that for large samples. With small samples, how- 
ever, the distribution of t is used instead of the normal distribution in 
fixing the limits of the interval. For large samples the 95 and 99 per cent 
confidence intervals for the mean arc given, respectively, by X ± l.Ofisi 
and X ± 2.0857. For small samples an unbiased estimate of <r ? is used in 
estimating the standard error. The value of t used in fixing the limits of 
the 95 and 99 per cent intervals will vary, depending on the number of 
degrees of freedom. Consider an example where A r = 24.20, s 2 = 04, 
N — 16, and df = 16 — 1. On reference to Table B of the Appendix we 
observe that for 15 degrees of freedom 95 per cent of the area of the dis- 
tribution falls within a t of +2.13 from the mean. The standard error 
using the unbiased variance estimate is 8/ \/l5. The 95 per cent con- 
fidence limits are given by 24.20 + 2.13 X These limits are 



158 


Estimation 


chap, xo 


19.88 and- 28.64. We may assert with 95 per rent confidence that the 
population mean falls within these limits. The 99 per cent limits are 
given by 24.26 ± 2.95 X 8/\/l5. These limits are 18.16 and 30.36. 

10.7 

Standard errors and confidence intervals 
of proportions 

The estimate of the standard error of a proportion is given by 

_ /p(l ~ P) _ \m . 

" V ~N ~\N 10 - 6 

where 1 — p — <?• Also, it may be readily shown that the standard error 
of a per cent is given by 



If it can be assumed that the sampling distribution of a proportion 
can be approximately represented by a normal distribution, then the 
95 and 99 per cent confidence limits for a proportion are gfVen by p 4 
1.96fl p and p ± 2.58 s Pj respectively. Whether or not the sampling dis- 
tribution can be represented by a normal distribution depends both on 
the size of the sample and on the value of p. For any given value of 
N the sampling distribution of a proportion becomes increasingly skewed 
as p and q depart from .50. Quite clearly, the formula for the standard 
error of a proportion should not be used with reference to a normal curve 
for extreme values of p and q. It has been suggested that the formula for 
the standard error of a proportion should be used only when Np or Nq f 
whichever is the smaller, is equal to or greater than 5. Thus when 
p = .10 and N = 20, Np = 2. The use of the formula s p — \/pq/N 
would be considered inappropriate here. When p = .10 and N = 100, 
Np — 10, Presumably, here the differences between the binomial and 
the normal distribution are quite small and can safely be ignored. 

10.8 

Standard errors and confidence intervals of 
other statistics 

The standard error of the median may be estimated by 
1.253s 



Exercises for chapter io 


159 


where s is obtained from the unbiased estimate of a 2 . Confidence limits 
at the 95 and 99 per cent levels may be located by taking ± 1.96s m dn and 
±2.58s m dn about the sample median. The above formulation assumes 
normality of the parent population and a large N. In many situations 
where the median is used, the distribution of the variable is not normal. 
This, indeed, is one of the reasons for using the median instead of the 
mean. In consequence the above formulation is of limited use. Con- 
fidence intervals for the median involving no assumptions about the shape 
of the distribution of the variable in the population, other than its con- 
tinuity, have been worked out by Nair (1940). Ilis method is described 
by Kenney and Keeping (1954) and Johnson (1949). Given N observa- 
tions arranged in ascending order, Xu X if . . . , X Nf the median is 
the middle value. The problem of fixing, say, 95 per cent limits is 
approached by locating two values of X t in this ascending series su'*h that 
the probability that these values will include the population median is 
not less than .95. 

The standard error of the standard deviation for large samples from 
a normal population is estimated by 

s 


The 95 and 99 per cent confidence limits can readily be obtained by 
taking .<* ± 1.96* % and s ± 2.58.-?., respectively. In using this formula 
a sample substantially greater than 30 should be regarded as large. The 
method of determining confidence limits for s based on small samples, and 
indeed the method which is perhaps most appropriate in all '*ases regard- 
less of size of N, involves a knowledge of the distribution of chi square, or 
X 2 . For a simple discussion of this method see Freund (1902) or Johnson 
(1949). The application of x 2 to a variety of statistical problems will be 
discussed in Chap. 13. 

EXERCISES 

1 Using a large-sample procedure, obtain the 95 and 99 per cent confi- 
dence intervals for a mean of 105, where N = 100 and .s = 10 . Obtain 
also the 75 and 85 per cent confidence intervals. 

2 A random sample of 400 observations has a mean of 50 and a standard 
deviation of 18. 'Estimate the 95 and 99 per cent confidence limits for 
the mean. 

3 How is the standard error of the mean affected by tripling sample size? 



i6o 


Estimation 


chap. 10 


4 KM i mate for the following data the 95 and 99 per cent confidence 
intervals for means: 


Jc 

A’ 

2{X - X) 2 

a 26 2 

7 

77 n 

b 5K 3 

11 

219 U 

c 40 :i 

25 

1 ,525 0 

d « 4 

16 

444 7 


5 Find tlie value of / for df - 20 such that the proportion of the area (a) 
to the right of f is .025, (b) to the left of ( is .0000, (c) between the mean 
and / is .to, ( d ) between H i« .90 

6 Obtain the values required in (a), (5), (c), and (<I) of Exercise 5 above 
for df - 5 . 

7 What proportion ot t he* area of the / distribution falls (a) above 

t - 3.109 when' df - 10, (/>) below ( — 1 72.') where* df - 20, (r) 

between t — + 3.059 where df - 29, (./) between l — 2.131 and 
/ — 2.G02 where df =- 1 5, (e) between / -= — t .')41 and t - ** where 
rf/ - 3? 

8 Obtain the theoretical sampling distribution of proportions in drawing 
samples of eight from a population where the population value of the 
proportion is .00. What is the standard deviation of the sampling 
distribution? 

9 Estimate the 95 and 99 per cent confidence limits for p - .75 where 
N -- 109. 



Tests of Significance: 

Means 


11 .1 Introduction 

In Chaps. 9 and 10 wc considered the sampling error associated with single 
sample values. Sampling distributions, standard errors of single sample 
values, and confidence intervals were discussed. In practical statistical 
work in psychology we are infrequently concerned with simply describing 
the magnitude of error associated with single sample values. Experi- 
mental data very often require a comparison and evaluation of two or 
more means, proportions, standard deviations, or other statistics obtained 
from separate samples or from the 1 same sample for measurements 
obtained under two or more experimental conditions. To illustrate, an 
investigator may wish to explore the effects of a tranquil'zing drug on the 
estimation of Time intervals as part of a study on time perception. He 
may administer a drug to an experimental group of subjects and a placebo, 
an inactive simulation of the drug, to a control group and measure the 
errors in time estimation made hv the two groups. The me.au error for 
tire two groups may be calculated. The experiment require.^ an evalua- 
tion of the difference between these two means. Both means are subject 
to sampling error. May the difference between the two means be proba- 
bly ascribed to sampling error, or may it be argued with confidence that 
the drug affects time perception? A decision is required between these 
alternatives. Statistical procedures which lead to decisions of this kind 
are known as tests of significance. 

Tests of significance may he applied to the difference between statis- 
tics calculated on independent samples or between statistics obtained 
under different conditions for the same sample. Sometimes a test of 
significance is applied to test the difference between a single sample 
statistic and a fixed value. An example is the procedure used to test 
whether a correlation coefficient is significantly different from zero. In 
this case the fixed value is zero. While many tests of significance involve 
a comparison of two sample statistics, or a single sample statistic and a 
fixed value, such tests can readily be extended to cover situations where 



162 


Tests of significance : means 


chap. 11 


more than two sample statistics are involved For example, m the experi- 
ment mentioned above on the effects of a drug on time perception, the 
experiment could be designed to include the administration of the drug in 
different dosages to different groups of subjects Three or four or five 
different dosages might be used, resulting in three or four or five different 
means. The means could be compared two at a time to ascertain whether 
or not the differences between them could be attributed to sampling 
error. A more efficient form of analysis, the analysis of variance, pro- 
vides a procedure for making an over-all tes 1 in this type of situation. 

II.2 

The null hypothesis 

Consider an experiment using an experimental and a control group A 
treatment is applied to the experimental group. The treatment is absent 
for the control group. Measurements are made on both groups. Pre- 
sumably any significant difference between the two groups can be 
ascribed with confidence to the treatment and to no other cause. Let 
Xi and X 2 be the means for the experimental and the control group, 
respectively. Both means are subject to sampling error. *The means 
X\ and X 2 are estimates of the population means mi and M2. The trial 
hypothesis may be formulated that no difference exists between mi and 
M 2 . This hypothesis is a null hypothesis and may be written 


ffo'.Hi — /x? — 0 II. I 

The symbol H () represents the null hypothesis. Very simply, this 
hypothesis assert, s that no difference exists between the two population 
means. Note that the statement mi — M2 — 0 is the same as mi = M2- 
Thus an alternative formulation of the hypothesis is to assert that the two 
samples are drawn from populations having the same mean. In general, 
regardless of the particular statistics used, the null hypothesis is a trial 
hypothesis asserting that no difference exists between population parame- 
ters. Thus a null hypothesis about two variances would take the form 
Ho:<n 2 — *2 2 — 0, or H$\o\ 2 = c 2 2 . 

The logical steps used by an investigator in applying a test of sig- 
nificance are these. First , he assumes the null hypothesis; that is, he 
operates on the trial hypothesis that the treatment applied will have no 
effect. Second , he examines the empirical data. Where the hypothesis 
pertains to two means he examines the difference between the two means, 
% i — £%. Third , the question is asked, what is the probability of obtain- 
ing a difference equal to or greater than the one observed in drawing sam- 



sec. 1 1.3 


Two types of error 


163 


pies at random from populations where the null hypothesis is assumed to 
be true? In the case of two means, what is the probability of obtaining 
a difference equal to or greater than Xi — X* in drawing random samples 
from populations wdiere mi — M2 ■= 0? Fourth , if this probability is small, 
the observed result being highly improbable on the basis of the null 
hypothesis, the investigator may be prepared to reject the null hypothesis. 
This means that the observed difference cannot reasonably be explained 
by sampling error and presumably may be attributed to the treatment 
applied. Thus the result may be said to be significant. If this proba- 
bility cannot be considered small and the observed result is not highly 
improbable, then sampling error may account for the difference observed. 
Hence we cannot with confidence infer that the difference results from 
the treatment applied. 

In the testing of any statistical hypothesis, it is necessary to specify 
an alternative hypothesis. This alternative is accepted if the initial 
hypothesis is rejected. Thus in the testing of the hypothesis 


Holfii — M 2 = 0 


the alternative may be If 1.^1 — M 2 5 * 0 Under certain circumstances, 
as described in detail in Sec. 11.5, some advantage may attach to a test 
of a null hypothesis against the alternative mi - m 2 > 0, or the alternative 
Mi ~ M2 < 0. The alternative hypothesis under consideration should be 
clearly recognized. 


11.3 

Two types of error 

In reaching a decision about the null hypothesis //«, tw r o types of error 
may arise. An alternative Hi may be accepted when the null hypothesis 
Ho is true. This is called a Type J error. The null Ho may be accepted 
when an alternative hypothesis II x is true This is called a Tyne II error. 
The probabilities of committing Type I and Type II errors are repre- 
sented, respectively, by a and fi. 

The situation may be represented follows: 


Accept H 1 
Accept Ho 


II 0 is true Hi is true 


a 

correct 

decision 

correct 

decision 

0 




164 


Tests of significance: means 


chap. 11 


When wo accept // 1 , and // 1 is true, this is a correct decision about 
nature. When we accept //<>, and // 0 is true, this is also a correct decision 
about nature. The acceptance of // 1 when If 0 is true and the acceptance 
of If 0 when // 1 is true are both errors. Both are incorrect decisions about 
nature. 

The situation above is somewhat analogous to that which arises in 
the acceptance or rejection of applicants for employment or admission 
to a university. For example, a university admissions officer may accept 
applicants who subsequently are successful hi the university, or he may 
reject applicants who would have failed if they had been accepted. In 
either case a correct decision may be said to have been made. On the 
other hand, the admissions officer may reject an applicant who would 
have been successful if he had been accepted, or he may accept an appli- 
cant who subsequently fails. In both instances an error has boon made. 


11.4 

Levels of significance 

The probability of Type I, or «, error is called the level of significance of a 
test. Ordinarily the investigator adopts perhaps rather arbitrarily, a 
particular level of significance. It is a common convention to adopt 
levels of significance of either .05 or .01. If the probability is equal to or 
less than .On of asserting that there is a difference between two means, for 
example, -when no such difference exists, then the difference is said to he 
significant at the .On, or 5 per cent, level or less. Here the chances are 
n in 100, or less, that the difference could result when the treatment 
applied is having no effect. If the probability U .01, or less, the differ- 
ence is said to be significant at the .01, or 1 per cent, level. The .On and 
.01 probability levels are descriptive of our degree of confidence that a 
real difference exists, or that the observed difference is not due to the 
caprice of sampling. Usually in evaluating an experimental result, it is 
unnecessary to determine the probabilities with a high degree of accuracy. 
For most practical purposes it is sufficient to designate the probability 
as p < .On, or p < .01, or possibly p < .001 if the result is highly 
significant. 

Decisions regarding the rejection, or otherwise, of the null hypothesis 
at a level of significance a are commonly made without reference to Type 
II, or 0, error. A detailed description of the functioning of 0 error is 
beyond tin* scope of this book. A few comments, howrever, may be 
appropriate. Consider, for example, the difference between two means, 
/ii and g 2 . For any particular value of «, say .05, the value of /3 is a func- 
tion of sample size N and the actual difference between m and g 2 . For 



sec. 1 1.5 


Directional and nondirectional tests 


165 


specified a and N , the value of 0. which is the probability of failing to 
recognize the existence of a difference when such a difference exists, will 
decrease with increase in th* difference between mi and M 2 . This means 
that the larger the difference between mi and M 2 the less likely we are to 
accept II 0 . For a .specified difference between m 1 and M 2 and a specified N 
the value of 0 will increase as a decreases. Accordingly, if too strict a level 
of significance is adopted, w r o may fail to reject the null hypothesis when in 
fact a fairly large difference between m and M 2 exists. For any specified 
difference between mi and M 2 , and any a, the Type II error is a function of 
sample size N. The smaller the sample the greater the value of /3. This 
means that although a large difference may exist between mi and M 2 » it 
may be difficult to prove for small samples. 

Although, quite clearly, failure to reject the null hypothesis does not 
imply that the null hypothec is true, many investigators exhibit an 
inclination to conclude, even for quite small samples, that no difference, 
or a trivial difference, exists when a required level of significance is not 
achieved. Our discussion of Type II error clearly indicates that such con- 
clusions are unwarranted. It would, of course, be possible to establish 
dual criteria such that if « were greater than, say, .05 and £ were less than, 
say, .05, the investigator, m practice, might be allowed to conclude that 
the difference was nonexistent or inconsequential. 

n-5 

Directional and nondirectional tests 

An investigator may wish to test the null hypothesis, II 0 :mi — M 2 = 0, 
against the alternative, //\:mi - M 2 ** 0. This means that if II 0 is 
rejected, the decision is that a difference exists between the two means. 
Xo assertion is made about the direction of the difference. Such a test 
is a nondirectional test. A test of tins kind is sometimes called a two- 
tailed or two-sided test, because if the normal distribution, or the distribu- 
tion of t, is used, the two tails, or the two sides, of the distribution are 
employed in the estimation of probabilities. Consider a 5 per cent .sig- 
nificance level. If the sampling distribution is normal, 2.5 per cent of 
the area of the curve falls to the right of 1.96 standard deviation units 
above the mean, and 2.5 per cent falls to the left of 1.90 standard devia- 
tion units below the mean. The area outside these limits is 5 per cent of 
the total area under the curve. Under the null hypothesis the chances 
are 2.5 in 100 of getting a difference of 1.96 standard deviation units in 
one direction because of chance factors alone, and 2.5 in 100 in the other 
direction. Hence the chances in either direction are 5 in 100. Thus for 
significance at the 5 per cent level for a nondirectional test, when the 



1 66 


Tests of significance: means 


chap, ii 


sampling distribution is normal, the observed difference must be equal to 
or greater than 1.96 times the standard deviation of the sampling distri- 
bution of differences. For significance at the 1 per cent level a value of 
2.68 is required for a nondirectional test. A nondirectional test is appro- 
priate if concern is with the absolute magnitude of the difference, that is, 
with the difference regardless of sign. 

Under certain circumstances we may wish to make a decision about 
the direction of the difference. It has been argued that few instances 
exist in scientific work where the traditional nondirectional test is of 
interest. If concern is with the direction of the difference we may test 
the hypothesis // 0 :mi — M 2 < 0 against the alternative H i.m — m > 0, 
or the hypothesis H 0 : m — M2 > 0 against the alternative // 2 :mi - M2 < 0. 
Note that the symbol H {) has been used to denote three different hypothe- 
ses: an hypothesis of no difference, an hypothesis of equal to or less than, 
and an hypothesis of equal to or greater than. Conventionally the term 
null hypothesis has been restricted to an hypothesis of no difference. 
It is not inappropriate, as pointed out by Kaiser (1960), to extend the 
meaning of the null hypothesis to include hypotheses of equal to or less 
than and equal to or greater than. Such tests are directional one-sided, 
or one-tailed, tests. If the normal, or /, distribution is used* one side or 
one tail only is employed to estimate the required probabilities. To 
reject 7 / 0 :mi — M 2 < 0 and accept Hum — M 2 > 0, using the normal 
distribution, a normal deviate > +1.64 is required for significance at 
the .05 level. Likewise to reject //<» : m 1 — M 2 > 0 and accept // 2 :m 1 — 
M 2 < 0, the corresponding normal deviate is < —1.64. The figure 1.64 
derives from the fact that for a normal distribution 5 per cent of the ar?a 
of the curve falls beyond -f 1.61 standard deviation units above the mean, 
and 5 per cent beyond —1.64 standard deviation units below the mean. 
For significance at the .01 level for a directional test, normal deviates 
> +2.3,3 or < —2.33 are required. 

Kaiser (1960) discusses a directional two-sided test, which in the end 
result uses much the same procedure as the one described above. His 
procedure requires a decision between three hypotheses, //o:mi — m = 0, 
Hum — m > 0, and // 2 :mi — M 2 < 0. The rules here, at the 5 per cent 
significance level, are that we decide upon II Q when the normal deviate 
falls between —1.64 and +1.64. We decide upon Hi when the normal 
deviate is greater than +1.64 and upon // 2 when the normal deviate is 
less than —1.64. This test amounts in practice to the same thing as 
making two directional one-sided tests. Involved in its development are 
errors of the third kind. These relate to a decision about a difference in 
the wrong direction. 

When is it appropriate to use a directional as distinct from a non- 



sec. 1 1.6 


Significance of difference between means 


167 


directional test? This question is open lo some controversy. Clearly 
there are many instances in research where the direction of the differences 
is of substantial interest; indeed, it has been argued that there are few, 
if any, instances where the direction is not of interest. At any rate it is 
the opinion of this writer that directional tests should be used more 
frequently. 


11.6 

Significance of the difference between two 
means for independent samples 

Let X 1 and JV 2 he two sample means based on N\ and jV 2 cases, respec- 
tively. We proceed by combining the data for the two samples to obtain 
the best unbiased estimate of the population variance. This estimate is 
obtained by adding together the two sums of square' of deviations about 
the two sample means and dividing this by the total number of degrees of 
freedom. This unbiased estimate of the population variance may be 
written as 

Nx .v, 

+ 

* 1 Nx + nT^z 

The two terms in the numerator are sums of squares of deviations about 
the means of the two samples of N\ and N ? cases, respectively. The 
total number of degrees of freedom on which s 2 is based is N\ + N2 — 2. 
We lose two degrees of freedom because deviations are taken separately 
about the means of the two samples. The unbiased variance estimate 
s 2 is used to obtain an estimate of the standard error of the difference 
between the two means. Tins standard error is given by 


l(X- X,)’ 






11.3 


The difference between means, X i — X*, is then divided by this estimate 
of the standard error to obtain the ratio 

, = *2 = *! = 

5 Ji-s* \/s 2 /N 1 + s 2 /N 2 

This ratio has a distribution of t with N x + N 2 -2 degrees of freedom . 
The values of t required for significance at the .05 and .01 levels will vary , 
depending on the number of degrees of freedom, and may be obtained by 
consulting Table B of the Appendix. 

The formula given above for s 2 is not very convenient from a com- 



1 68 


Tests of significance: means 


chap. ii 


putational \iew point A more convenient formula is 

y - (v xy .v, + v x* - q? a')*/*v2 

A', + .V« - 2 S 

This, if desired, mav he further modified by writing 

(v xy . (V-v) 2 

Vw v / _ A'i-Vr and V % T ' = tf*AY 
A i A 2 


Let the following he error scons obtained for two groups of experi- 
mental animals m miming a maze under different experimental conditions 
C iroup l Hi 9 4 2d 19 10 o 2 

(Jioup li 20 "i 1 10 2 1 

The following statistic^ aie calculated from these data 

(iioiip l (ilollp I* 


\ st, 

x\ ss is 

X 11 s 

2 V 1,37 2 7( )J 


The unbiased estimate of variance i^ 

, _ 1,872 - 88^8 4- 702 
8 -f~ 0 — 2 

The t ratio is then 

/- , ll - 8 

\/60 17 8 + 00 17/ii 

The number of degrees of freedom m this example is 8 + 0 -2 — 12 
For 12 degrees of freedom a t equal to 2 179 is required for significance 
at the Oo level In this example the difference between means is not 
significant No adequate grounds exist for rejecting the null hypothesis 
We are not justified in drawing the inference from these data that the tw'o 
experimental conditions are exerting a differential effect on the behavior 
of the animals. 

The t test described here assumes that the distributions of the varia- 
bles m the populations from which the samples are drawn are normal. It 
assumes also that these populations have equal variances This latter 
condition is referred to as homogeneity of variance, or homoscedasticity. 
The t test should be used only when there is reason to believe that the 


48' V> 


<>() r 


72 



sec. 1 1.7 


Significance of difference between means 


169 


population distributions do not depart too grossly from the normal form 
and the population variances do not differ markedly from equality. 
Tests of normality and homogeneity of variance may be applied, but these 
tests are not very sensitive for small samples. 


II-7 

Significance of the difference between two means 
for correlated samples 

Consider a situation where a single group of subjects is studied under 
two separate experimental conditions. The data may, for example, be 
autonomic response measures under stress and nonstress or measures of 
motor performance in the presence or absent e of a drug. The data are 
composed of pairs of measurements. These may be correlated. This 
circumstance leads to a test of significance between means different from 
that for independent samples. A procedure for testing significance may 
be applied without actually computing the correlation coefficient between 
the paired observations. This method is sometimes called the difference 
method. Its nature is simply described Given a set of N paired obser- 
vations, the difference between each pair may he obtained. Denote any 
pair of observations by Xi and X 2 and the difference between any pair 
A~i — X 2 by D. The mean difference over all pairs is (2D) 'N = D. 
It is readily observed that the difference between the means of the two 
groups of observations is equal to the mean difference. The difference 
between any pair of observations is X\ — X 2 = D. Summing over N 
pairs yields 2Xi — 2X 2 = 2D. Dividing by N, we obtain Xt - X 2 = D. 
Since the mean difference is the difference between the two means, we 
may test for the significance of the differences between means by testing 
whether or not f) is significantly different from zero. Here in effect we 
treat the ])’$ as a variable and test the difference between the mean of this 
variable and zero. 

An unbiased estimate of the variance of the D 1 s is given by 


2(1) - D) 2 

N - 


11 . 6 


where N is the number of paired observations. Using this unbiased 
estimate, the sampling variance of D is given by 


S3 2 - 


SD 2 

N 


1 1.7 


To test whether I) is significantly different from zero, we divide D by its 



170 


Tests of significance: means 


chap, ii 


standard error to obtain 


The number of degrees of freedom used in evaluating / is one less than the 
number of pairs of observations, or N — 1 . The reader should note that 
the D in the numerator of the above formula is in effect D — 0, which is of 
course D. This test is concerned with the significance of 1) from zero. 

The above formula for t is not convenient computationally. A more 
convenient computational formula is 

Is- 

\/\N2D* - (2D)*] f(N - T) ‘ 19 

The data below are those obtained for a group of 10 subjects on a choice- 
reaction-time experiment under stress and nonstress conditions, the 
stress agent being electric shock. The figures are the number of false 
reactions over a series of trials. The problem here is to test whether the 


Subject 

Stress 

-Y, 

Nonstress 

A, 

D 

♦ 

D 2 

i 

7 

6 

2 

4 

2 

9 

15 

-6 

36 

3 

4 

V 

- 3 

9 

4 

15 

11 

4 

16 

r* 

6 

4 

2 

4 

6 

3 

7 

— 4 

16 

7 

9 

8 

1 

1 

8 

5 

10 

-5 

25 

9 

6 

6 

0 

0 

10 

12 

10 

-4 

16 

Sum 

Mean 

76 

7 60 

89 

8 90 

-13 
- 1 30 

127 


means under the two conditions are significantly different. These means 
are 7.60 and 8,90. The difference between them is equal to the mean of 
the differences, or — 1.30. The sum and sum of squares of D are, respec- 
tively, — 13 and 127. Hence 


t = — - -1.18 

\/[10 X 127 - ( — 13) 2 J/ (10 - 1) 

We may ignore the negative sign of t and consider only its absolute mag- 



sec. 1 1.8 


Significance of difference between means 


171 


nitude. The number of degrees of freedom associated with this value of 
t is 9. For 9 degrees of freedom we require a l of 2.202 for significance at 
the 5 per cent level. The observed value of t is well below this, and the 
difference between means is not significant. We cannot justifiably argue 
from these data that the mean number of false reactions under the two 
conditions is different. 

The method described above takes into account the correlation 
between the paired measurements. This results because the variance of 
differences is related to the correlation between the paired measurements 
by the formula 

f*D 2 = Sl 2 + S 2 2 — 2/*126 , i$2 1 1. 10 

When #i, $ 2 , and 7*12 have been computed, as will frequently be the case, 
the variance of differences a jy can bo readily obtained from the above 
formula and need not be obtained by direct calculat’on on the differences 
themselves. A positive correlation between the paired measurements 
will reduce the size of sn' 2 and 


11.8 

Significance of the difference between means 
where population variances are unequal 


The t test for the significance of the difference between means assumes 
equality of the population variances. Where the assumption of equality 
of variance is untenable, the ordinary t test should not be applied. 
Approximate methods for use where the variances are unequal have been 
suggested by Cochran and (‘ox ( 1 9f>0) and by Welch (1938). The 
method of Cochran and Cox makes an adjustment in the value of t 
required for significance at the f> or 1 per cent level, or other critical level 
as may be required. The method proposed by Welch makes an adjust- 
ment in the number of degrees of freedom. 

To use the Cochran and ('ox n cthod we proceed by calculating the 
standard error of the differences between the two means, using the 
formula 


xi 




S(X - X 1) 2 
Nv(Ni - lj 


2(X 2 

A’. {Nt- 1) 

= V si , 2 + Sj . 2 


II. II 


The difference between the sample means is then divided by the standard 
error of the difference to obtain 


i = 


X t 


x 2 


$ 5 ,- 1 , 



172 


Tests of significance: means 


chap, ii 


One sample is based on A3 eases with N\ — 1 degrees of freedom, the other 
on N% cases with A3 — 1 degrees of freedom. Assume that a two-tailed 
test at the 5 per cent level is appropriate. Refer to a table of / and obtain 
the critical value of / required for significance at the 5 per cent level with 
N i — l degrees of freedom. Obtain aNo the value of / required with 
A3 — 1 degrees of freedom Denote these two values of / as and t 2 . 
The approximate value of / required for significance at the 3 per cent level 
is given by the formula 


t Oh 


^Si 2 U + SxfU 

+ ,S J? 


II. 12 


The value of t obtained by dividing the difference between means by the 
standard error of their difference must be equal to or greater than / 0 ., 
before significance at the 3 per cent level can be claimed. 

Consider the following data: 

Sample A Sample H 


N i = 13 

X\ = 20 ao 

2(A r - XxY - U28 
V = 7.23 


No - 9 
X 2 ~ 13.10 
2(A - A3)° - J/209 
- 17.02 


The standard error of the difference between means is 




1,128 
1303 - 1) 


1,209 
9(9 - 1) 

- \/7.23 +■ 17.02 --= 


4.98 


Divide this into the difference between means to obtain 


20.99 - 13.10 
1.98 


2.39 


For 13 — 1-12 and 9 — 1 =8 degrees of freedom, the values of t 
required for significance at the 5 per cent level are, respectively, 2.179 and 
2.306. The value required for significance at the 5 per cent level in test- 
ing the significance of the difference between means is then 

_ 7.23 X 2.179 + 17.02 X 2.300 _ %ym 
'• 08 ' 7.23 + 17.t>2 ' 

This value 2.27 is less than the obtained value 2.39. (Consequently wo 
may conclude that the difference between means is significant at the 
5 per cent level. 

Another approximate method proposed by Welch (1947) requires the 
calculation of a t value as above by dividing the difference between means 



cec. 1 1.9 


Significance of difference between means 


173 


by their standard error. We then refer this value to the table of t using 
the following formula for the number of degrees of freedom : 


rf/ = 


(V + V ) 2 


+ l) + + 1) 


- 2 


11.13 


Applying this formula to the previous data we obtain 

df - — (7 - 2 i+ 17 62)S o _ K7fi 
1 7.23714 + 17.62710 “ 15 ' A> 

The value of df will seldom be a whole number. If df is taken as 16 , the 
value of t required for significance at the 5 per cent level is 2.12. If df is 
taken as 15 , the value is 2 . 13 . In either case the observed value of t, 2 . 39 , 
exceeds the value required for significance at the 5 per cent level and we 
may conclude that the difference between means is significant. This 
result is in agreement with that obtained using the Cochran and Cox 
procedure. 

The above procedures are approximate. For a more accurate 
method the reader is referred to Welch (1947) and A^pen (1949). The 
latter author has prepared tables which assist the comparison of means 
involving two variances, separately estimated. The problem has also 
been discussed by Gronow (1951). 


11.9 

Significance of the difference between means 
when the population distributions are not normal 

The t test, for the significance of the difference between means assumes 
normality of the distributions of the variables in the populations from 
which the samples are drav>n. Where the variables are not normally 
distributed, what effect will this have on the probabilities, and sig- 
nificance levels, as estimated from the distribution of /? 

Under certain conditions the sampling distribution of means of size 
N , where N is large, is closely approximated by the normal distribution. 
This result holds regardless of the shape of the distribution in the popula- 
tion from which the sample, are drawn. The closeness of the approxi- 
mation improves as N becomes increasingly large. The implication of 
this is that for large samples the nonnormality of the populations will 
not seriously affect the estimation of probabilities, except perhaps in 
cases of very extreme skewness. 

A number of investigators have studied the effect of nonnormal 
populations on the t test for small samples. The empirical evidence sug- 
gests that even for quite small samples, say, of the order of 5 or 10, reasona- 



174 


Tests of significance: means 


chap, ii 


bly large departures from normality wil] not seriously affect the estima- 
tion of probabilities for a two-tailed t test. A one-tailed i tost is, however, 
more seriously affected by normorinality. This results from the skewness 
of the sampling distribution. 

Where the data show fairly gro^ departures from normality it is 
probably advisable to use non parametric, or distribution-free , methods. 
These methods provide tests which are independent of the shapes of the 
distributions in the populations from which the samples are drawn. 
They deal w T ith the ordinal or sign properties of the data. A number 
of such tests are described in ('‘hap. 22 of this book. Nonparametric 
methods are being used with increasing frequency in psychological 
research. 


II. io 

Effect of grouping on sampling error 

The error introduced by grouping data in the form of a frequency dis- 
tribution exerts no systematic effect on the mean as an estimate of the 
population value. The error variance of the mean computed from 
grouped data is, however, greater than the error variance <jf the mean 
computed from ungrouped data. The error variance of a mean for 
grouped data is comprised of two components, one resulting from sam- 
pling error, the other from grouping error. The standard deviation is 
systematically influenced by grouping error; the effect of grouping error is 
to increase the standard deviation. In computing the standard error of 
the mean foi grouped data by the formula sj -- s z /\/N , values of s x 
uncorrected for grouping should be used. This results in a value of s? 
which is greater than that obtained by using the corrected value of s x . 
The use of the uncorrected value of s x adjusts for the increase in the error 
variance of the mean resulting from grouping error. In general, in apply- 
ing any test of significance to statistics calculated from grouped data, 
values uncorrected for grouping should be used. 


EXERCISES 

I The following are data for two samples of subjects under two experi- 
mental conditions: 

Sample A 2 5 7 9 6 7 

Sample B 4 16 11 9 8 

Test the significance of the difference between means using a non- 
directional test. 



Exercises for chapter xi 


175 


2 The following are data for two independent samples: 


Sample ,1 Sample B 


X 124 120 

N 50 36 

2 (A" — X) 2 5,512 5,184 


Test whether the mean for sample .1 is equal to or greater than that for 
sample B. 

3 The following are paired measurements obtained for a sample of eight 
subjects under two conditions: 

Condition A 8 17 12 19 5 0 20 3 

Condition B 12 31 17 17 8 14 25 4 

Test the significance of the difference between means using a non- 
direetional test. 

4 Calculate t for the following data: 

Sample A Sample B 


X 20 25 

N 25 10 

SA' 2 12,500 5,000 


5 lor a sample of 20 paired measurements £/) = 52 and 2D 2 = 400. 
Calculate /. 

6 What advantages attach to matched groups or paired observations in 
experimentation ? 

7 The means for two independent samples of 10 and 17 cases are 9.03 
and 14.10, respectively. The unbiased variance estimates are 04.02 
and 220.30. Compare the methods proposed by Cochran and Cox 
with those proposed by Welch to b-st the significance of the difference 
between the two means. 



Tests of Significance: 
Other Statistics 



.1 Introduction 


In Chap. 11 problems associated with the application of tests of sig- 
nificance to arithmetic means were discussed. Xot infrequently, tests 
of significance for proportions, variances, correlation coefficients, or other 
statistics are required. The general rationale underlying the application 
of such tests of significance is precisely the same as that for arithmetic 
means, although the technical procedures used in estimating the required 
probabilities are different. The present chapter discusses procedures for 
applying tests of significance to proportions, variances, and correlation 
coefficients, for independent and correlated samples. 


12.2 

Significance of the difference between two 

independent proportions 

Questions arise in the interpretation of experimental results which 
require a te«t of significance of the difference between two independent 
proportions. The data are comprised of two samples drawn inde- 
pendently. Of the A 7 i members in the first sample, fi have the attribute 
-*1. Of the N 2 members in the second sample, ft have the attribute A. 
The proportions having the attributes in the two samples are/i/A’i = p i 
and ft/Nt = p 2 . Can the two samples be regarded as random samples 
drawn from the same population? Is pi significantly different from p 2 ? 
To illustrate, in a public opinion poll the proportion .05 in a sample of 
urban residents may express a favorable attitude toward a particular 
issue as against a proportion .on in a sample of rural residents. May the 
difference between the proportions be interpreted as indicative of an actual 
urban-rural difference in opinion? To illustrate further, the proportion 
of failures in air-crew training m two training periods may be .42 and .50. 
Does this represent a significant change in the proportion of failures, 
or may the difference be attributed to sampling considerations? 



sec. 12.2 


Significance of difference between proportions 


177 


The standard error of a single proportion is estimated by the formula 



where p = sample value of a proportion 
Q = 1 “ V 

The standard error of the difference between two proportions based on 
independent samples is estimated by 


• 9 pi-pj 



where p is an estimate baser) on the two samples combined. The value 
p is obtained by adding together the frequencies of occurrence of the 
attribute m the two samples and then dividing this by the total number 
in the two samples. Thus 


P = 


_/i_+ 

N 1 4 N* 


where /i and / 2 are the two frequencies. 

The justification for combining data from the two samples to obtain 
a single estimate of p resides m the fact that in all cases where the differ- 
ence between two proportions is tested, the null hypothesis is assumed. 
This hypothesis states that no difference exists in the population propor- 
tions. Because the null hypothesis is assumed, we may use an ^timate 
of p based on the data combined for the two samples. This procedure is 
analogous to that used 111 the t test for the difference between means for 
independent samples where the sums of squares for the two samples are 
combined to obtain a single variance estimate. 

To test the difference between two proportions we divide the ob- 
served difference between the propoifions by the estimate of the standard 
error of the difference to obtain 


2 = P l T Pj = - -P x ~"J ?2 r 12 2 

* ~ Vvd(\,N 1) 4 ( 1 /A 7 2 )] 

The value z may be interpreted as a deviate of the unit normal curve, 
provided N\ and N 2 are reasonably large* and p is neither very small nor 
very large. As usual for a two-tailed test, values of 1.96 and 2 . 08 are 
required for significance at the o and 1 per cent levels. 

How large should the A r \s be and how far should p depart from 
extreme values before this ratio can be interpreted as a deviate of the unit 
normal curve? An arbitrary rule may be used here. If the smaller value 
of p or q multiplied by the smaller value of N exceeds 5 , then the ratio may 



17 * 


Tests of significance : other statistics 


chap, la 


be interpreted with reference to the normal curve. Thus if p = .60, 
q = .40, Ni = 20, and N% = 30, the product .40 X 20 = 8 and the nor- 
mal curve may be used. 

To illustrate, we refer to data obtained in a study of the attitudes of 
Canadians to immigrants and immigration policy. Independent samples 
of French- and English-speaking Canadians were used. Subjects were 
asked whether they agreed or disagreed with present government immi- 
gration practices. In the French-speaking sample of 300 subjects, 17C 
subjects indicated agreement. The proportion pi is 176/300 = .587. 
In the English-speaking sample of 500 subjects, 384 indicated agreement. 
The proportion p* is 384/500 = .768. By combining data for the two 
samples we obtain a value 


_ 176 + 384 
P 300 + 500 


= .700 


The value of q is 1 — .700 = .300. The estimate of the standard error 
of the difference is 


8 pi-pi ~~ \/*700 X .300(^5^ + irnu) - -033 


The required z value is 


z — 


.768 - .587 

.011 


541 


Interpreting the value 5.41 as a unit-normal-curve deviate we observe 
immediately that the difference is highly significant. The chances are 
one in a great many millions that the observed difference could result from 
sampling. We may very safely conclude from these data that a real 
difference exists between French- and English- speaking Canadians on the 
question asked. 

An alternative, but closely related, method exists for testing the 
significance of the difference between proportions for independent sam- 
ples. This method uses chi square and is described in Chap. 13. 


12.3 

Significance of the difference between two 
correlated proportions 

Frequently in psychological work we wish to test the significance of the 
difference between two proportions based on the same sample of indi- 
viduals or on matched samples. The data consist of pairs of observations 
and are usually nominal in type. The paired observations may exhibit 



sec. 12.3 


Significance of difference between proportions 


179 


a correlation, which must be taken into consideration in testing the dif- 
ference between proportions. To illustrate, a psychological test maybe 
administered to a sample of N individuals. The proportions passing 
items 1 and 2 are pi and p 2 . Bailed observations arc available for each 
individual. One individual may ''ass’ item 1 and also “pass” item 2. 
A second individual may “pass” item 1 ami “fail” item 2. A third indi- 
vidual may “fail” both items. The paired observations inav be tabulated 
in a 2 X 2 table. A tendency may exist for individuals who pass item 1 
to also pass item 2 and for those who fail item 1 to also fail item 2. Thus 
the items are correlated. A further illustration arises wdiere attitudes 
are measured with an attitude scale before and after a program designed 
to induce attitude change. On any particular attitude item a before 
response and an after response are available. Thus the data are com- 
prised of a set of paired observations. To apply a test of significance to 
the difference between before and after proportions on any particular item 
requires that the correlation between responses be taken into account. 

We proceed by tabulating the data in the form of a fourfold, or 2 X 2, 
table. A table with four cell frequencies is obtained. By way of illus- 
tration, assume that the data are “pass” and “fail” on two test items. 
The data may be represented schematically as follows: 

Frequencies Proportions 

Item 2 Item 2 



Fail 

Pass 


Fail 

Pass 

Pass 


n 

A 4- H -h Pass 

c 

o 

6 

Fail 

r 

i) 

(' + 1) 5 Fail 

| 

c 

d 


a i r 

H f / 

A' 

02 

Z>2 


The capital letteis represent frequencies. The small letters are propor- 
tions obtained by dividing the frequencies by N. The proportions pass- 
ing the two items are p x and p 2 . We wish to te< t the significance of the 
difference between p L and p 2 . 

An estimate of the standard error of the difference between two corre- 
lated proportions is given by the formula 


^Pi-Pa 




o d 

,Y 


1 a. 3 


This formula is due to McNemar (1947, 1955). It takes into account the 
correlation between the paired observations. A normal deviate z is 
obtained by dividing the difference between the two proportions by the 





i8o 


Tests of significance : other statistics 


chap. 12 


standard error of the difference. Thus 


z = 


pi — P 2 
la + d 
\~N~ 


12.4 


When the sum of the two cell frequencies, A + D, is reasonably large, 
this ratio can be interpreted as a unit-normal-curve deviate, values of 
1.96 and 2.58 being required for significance at the 1 and 5 per cent levels 
for a two-tailed test In this context a reasonably large value of A -f D 
may be taken as about 20 or above. 

It may be shown that the formula for the value of z given above 
reduces to 

D - A 

Z — r~r-r — 12*5 

Va + d 


where A and D are the cell frequencies. For computational purposes 
this is the more useful formulation. 

To illustrate, consider the following fictitious data relating to atti- 
tude change. Let us assume an initial testing followed by a program 
intended to produce a change m attitude, and then a second testing with 
the same attitude scale. On a particular question let the data for the two 
testings be as follows: 


Frequencies Proportions 

2d 2d 



Disagree 

Agree 



Disagree 

Agree 

Agree 

10 

50 

60 

Agree 

la 

05 

25 

Disagree 

110 

30 

140 

J )isagree 

55 i 

i 15 


120 

80 

200 


00 

40 


30 

70 

1 00 


Inspection of the above tables indicates a high correlation in response 
between the first and second testings. Wo wish to test the significance 
of the difference between .40 and .30. The standard error of the differ- 
ence between the two proportions is 


, _ /. 05 + .15 

S p\-P2 ~~ y ~ 200 ~~ 


The value of z is 


AO- JO 
.0316 316 


In this case the difference is significant. It exceeds the value of 2.58 
required for significance at the 1 per cent level for a two-tailed test. 





sec. 12.4 


Significance of difference between variances 


181 


\igun ents may be advanced for the use of a one-tailed test with the 
above data It may be assumed that knowledge of a piogiam intended 
to induce attitude change mav wan ant a hypothesis about the dnection 
of the change In either case the result is significant 


12.4 

Significance of the difference between variances 
for independent samples 

Occasions ause when* a test ot flu* significance of the ditfeiem e between 
the variances ol measuiements foi two independent samples is required 
In the conduct of a ‘simple experiment using control and experimental 
groups, the effect of the exp< iinuntal condition may reflect itself not only 
in a mean difference between the two groups but also m a variance differ- 
ence I or ex ti 11 pic, in an experiment designed to tud> the effect of a 
(list 1 ac ting agent such as now on motor performance Ihoeflect of the 
distraction mav lie to gic itly lmica^e the v inability 01 per toi mane e in 
addition possihlv to exciting ^onn efte'ct upon tin mean 1 he vanances 
obtained m any expe 11111c lit should always he the object ot sc uitinv and 
compaiison \ lommou situation when a test ot the ignitic anc e of the 
difToic m e betvvmi variances is lecpuicd is 111 relation to the / U sf (01 the 
‘■lgmficance of the difference between two moari" Hus test assumes the 
equal 1! y of varianc o~. m the ])opulations tiom winch the samples aie drawn, 
that is, it assumes that oV cr> <r 2 This condition is usually spoken 
of as homogeneity of varianc f 

Let si 2 and s/ be two variance^ basic! on lnclependent samples 
We may considei the diflciomc m 0 — s»° \n alternate pioceduie is to 
consider tin tatio 01 xi If the two vanances are equal, this 

ratio will be unitv If the* cliflei and si’ > then s 2 2 > 1 and 
s 2 2 m* 1 V departure of the- vaname ratio trom unity is indicative 
of a difteic me between variance's the gieatei the departure the greater the 
difference Quite cleailv, a test of the significance ot the departure of 
the ratio ot two vauanc es from umtv u ill serve as a test ot the significance 
of the difference between the two vanances 

To ipplv such a test the sampling distribution of the ratio of two 
vanances is icquired To conceptualize such a sampling distribution, 
consider two normal populations A and B with the same vai lance <r 2 
Draw samples of Ni cases fioi > A and N2 cases from B, calculate unbiased 
variance estimates *>i 2 and &2 2 , and compute the ratio si 2 /s2 2 Con- 
tinue this procedure until a large number of variance ratios is obtained 
Always place the variance of the sample drawn from A in the numerator 
and the variance of the sample drawm fiom B 111 the denominator Some 



182 


Tests of significance : other statistics 


chap. 12 


of the variance ratios will be greater than unity; others will be less than 
unity. The frequency distribution of the variance ratios for a large 
number of pairs of variances is an experimental sampling distribution. 
The corresponding theoretical sampling distribution of variance ratios is 
known as the distribution of F. The variance ratio is known as an F 
ratio; that is, F = #i 2 /***, or F = s 2 2 /$i 2 . 

In the above illustration samples of A r i are drawn from one popula- 
tion and samples of # 2 from another. A'i — 1 and Ni — 1 degrees of 
freedom are associated with the two variance estimates. A separate 
sampling distribution of F exists for every combination of degrees of 
freedom. Table D of the Appendix shows values of F required for sig- 
nificance at the 5 and 1 per cent levels for varying combinations of degrees 
of freedom. This table shows values of F equal to or greater than unity. 
It does not show values of F less than unity. The number of degrees of 
freedom associated with the variance estimates in the numerator and 
denominator are shown along the top and to the left, respectively, of 
Table D. The numbers in lightface type are the values for significance 
at the 5 per cent level, and those in boldface type the values at the 1 per 
cent level. These values cut off 5 and 1 per cent of one tail of the dis- 
tribution of F. 

In testing the significance of the difference between two variances, 
the null hypothesis // 0 : cn 2 — a 2 2 - <r 2 is assumed. We then find the 
ratio of the two unbiased variance estimates. These are 

MX - Xi)2 

N i - 1 

and 

MX - X,)* 

# 2—1 

No prior grounds exist for deciding which variance estimate should be 
placed in the numerator and which in the denominator of the F ratio. In 
practice the larger of the two variance estimates is always placed in the 
numerator and the smaller in the denominator. In consequence the F 
ratio in this situation is always greater than unity. The F ratio is calcu- 
lated, referred to Table D of the Appendix, and a significance level deter- 
mined. At this point a slight complication arises. The obtained sig- 
nificance level must be doubled. Table D shows values required for 
significance at the 5 and the 1 per cent levels. In comparing the vari- 
ances for two independent groups these become the 10 and the 2 per cent 
levels. The reason for this complication resides iu the fact that the 
larger of the two variances has been placed in the numerator of the F ratio. 



sec. 12.5 


Significance of difference between variances 


183 


This means that we have considered one tail only of the F distribution. 
Not only must we consider the probability of obtaining «i 2 /« 2 2 but also 
the probability of s 2 2 /si 2 . Where interest is in the significance of the 
difference, regardless of direction, the required per cent or probability 
levels are simply obtained by doubling those shown in Table D. 

Table D has been prepared for use with the analysis of variance 
(Chap. 18) which makes extensive use of the F ratio. In the analysis of 
variance the decision as to which variance estimate should be put in the 
numerator and which in the denominator is made on grounds other than 
their relative size. Consequently, in the analysis of variance, F ratios 
less than unity can occur and Table D provides the appropriate proba- 
bilities without any doubling procedure. 

To illustrate, a psychological test is administered to a sample of 
31 boys and 20 girls. The sum of squares of deviations, 2(X — X) 2 , is 
1,920 for boy ^ and 2,875 for girls. Unbiased variance estimates are 
obtained by dividing the sum of squares by the number of degrees of 
freedom. The df for boys is 31 — 1 = 30 and for girls 26 — 1 = 25. 
The variance estimate for boys is 1,920 / 30 = 64.20 and for girls 


2,875 

25 


- 115.00 


Are boys significantly different from girls in the variability of their per- 
formance on this test? The F ratio is 115.00/64.20 = 1.79. The df for 
the numerator is 25 and for the denominator 30. Referring this F to 
Table D w r e see that a value of F of about 1.88 is required for significance 
at the 5 per cent level, and doubling this we obtain the 10 per cent level. 
It is clear, therefore, that the difference between the variances for boys 
and girls cannot be considered statistically significant. The evidence is 
insufficient to warrant rejection of the null hypothesis. 


12.5 

Significance of the difference between 
correlated variances 

(liven a set of paired observations, the two variances are not independent 
estimates. Data of this kind arise when the same subjects are tested 
under two experimental conditions, or matched samples are used. For 
example, in an experiment designed to study the effects of an educational 
program on attitude change, attitudes may be measured, an educational 
program applied, and attitudes remeasured. It may be hypothesized 
that some change in variance of attitude-test scores may result. An 



x84 


Tests of significance: other statistics 


chap. 12 


increase in variance may mean that the effect of the program is to 
reinforce existing attitudes, producing more extreme attitudes among 
individuals at both ends of the attitude continuum. A decrease in vari- 
ance may mean that the effect of the program is to produce an attitudinal 
regression to greater uniformity. 

If Si 2 and S 2 2 are the two unbiased variance estimates and ri 2 is the 
correlation between the paired observations, the quantity 

, _ (Si* - «2 S ) VN ^2 
V4 Sl W(f=^) 

has a t distribution with N — 2 degrees of freedom. 

By way of illustration let Si 2 and s z 2 be unbiased variance estimates of 
attitude scale scores before and after the administration of an educational 
program. Let si 2 = 153.20 and s 2 2 = 102.51 where N = 38. The cor- 
relation between the before-and-after attitude measures is .60. Are the 
two variances significantly different from each other? We obtain 

t = il53.20j- 102.51) \/38jf2 __ x 

VOT 15M0X 102.51(1 - .36) 

The number of degrees of freedom is 38 — 2 = 36. For significance at 
the 5 per cent level, a value of t equal to or greater than about 2.03 is 
required. The evidence is insufficient to warrant rejection of the null 
hypothesis. We cannot argue that the intervening educational program 
has changed the variability of attitudes. 

12.6 

Sampling distribution of the correlation coefficient 

We may draw a large number of samples from a population, compute a 
correlation coefficient for each sample, and prepare a frequency distribu- 
tion of correlation coefficients. Such a frequency distribution is an 
experimental sampling distribution of the correlation coefficient. To 
illustrate, casual observation suggests that a positive correlation exists 
between height and weight. A number of samples of 25 cases may be 
drawn at random from a population of adult males, and a correlation 
coefficient between height and weight computed for each sample. The^e 
coefficients will display variation one from another. By arranging them 
in the form of a frequency distribution an experimental sampling dis- 
tribution of the correlation coefficient is obtained. The mean of this 
distribution will tend to approach the population value of the correlation 
coefficient with increase in the number of samples. Its standard devia- 



sec. 12.6 


Sampling distribution of the correlation coefficient 


185 


tion will describe the variability of the coefficients from sample to sam- 
ple. A further illustration may prove helpful. By throwing a pair of 
dice a number of times, say, a white one and a red one, a set of paired 
observations is obtained. A correlation coefficient may be calculated 
for the paired observations. Since the two dice are independent, the 
expected value of this correlation coefficient is zero. However, for any 
particular sample of N throws, a positive or a negative correlation may 
result. A large number of samples of N throws may be obtained, a cor- 
relation coefficient computed for each sample, and a frequency distribu- 
tion of the coefficients prepared. The mean of this experimental sam- 
pling distribution will tend to approach zero, the population value of the 
correlation coefficient, and its standard deviation will be descriptive of 
the variability of the correlation in drawing samples of size N from this 
particular kind of population. Note that here, as in all sampling prob- 
lems, a distinction is drawn between a population value and an estimate 
of that value based on a sample. The symbol p is used to refer to the 
population value of the correlation coefficient, and r is the sample value. 

The shape of the sampling distribution of the correlation coefficient 
depends on the population value p. As p departs from zero, the sampling 
distribution becomes increasingly skewed. When p is high positive, say, 
p = .80, the sampling distribution has extreme negative skewness. Simi- 
larly, when p is high negative, say, p = - .80, the distribution has extreme 
positive skewness. When p = 0, the sampling distribution is sym- 
metrical and for large values of N, say, 30 or above, is approximately 
normal. The reason for the increase in skewness in the sampling dis- 
tribution as p departs from zero is intuitively plausible. In sampling, for 
example, from a population where p .90, values greater than 1.00 
cannot occur, whereas values extending from .90 to —1.00 are theo- 
retically possible. The range of possible variation below .90 is far 
greater than the range above .90. This suggests that the sample values 
may exhibit greater variability below than above .90, a circumstance 
which leads to negative skewness. 

The standard deviation of the theoretical sampling distribution of p, 
the standard error, is given by the formula 


1 -r 
Vn - 1 


12.7 


When p departs appreciably from zero, this formula is of little use, 
because the departures of the sampling distribution from normality make 
interpret ation difficult. 

Difficulties resulting from the skewness of the sampling distribution 
of the correlation coefficient are resolved by a method developed by R. A. 



1 86 


Tests of significance : other statistics 


chap. 12 


Fisher. Values of r are converted to values of z r using the transformation 

Zr = \ log, (1 + r) - I log, (1 - r) 12.8 

Values of z r corresponding to particular values of r need not be computed 
directly from the above formula, but may be simply obtained from Table 
E in the Appendix. For r = .50 the corresponding z r = .549, for r = .90 
z r = 1.472, and so on. For negative values of r the corresponding z T 
values may be given a negative sign. In a number of sampling problems 
involving correlation, r ’ s are converted to z/s and a test of significance is 
applied to the z r ’s instead of to the original r' s. 

One advantage of this transformation resides in the fact that the 
sampling distribution of z T is for all practical purposes independent of p. 
The distribution has the same variability for a given N regardless of the 
size of p. Another advantage is that the sampling distribution of z , is 
approximately normal. Values of z r can be interpreted in relation to the 
normal curve. The standard error of z T is given by 

1 

S, = - /=»- - 12. Q 

' VN - 3 

The standard error is seen to depend entirely on the sample *ize. 

The z T transformation may be used to obtain confidence limits for r. 
Letr = .82 for N — 147. The corresponding z t — 1.157. The standard 
error of z r , given by 1 / y/ N — 3, is .083. The 95 per cent confidence 
limits are obtained by taking 1.96 times the standard error above and 
below the observed value of z T} or z r ± 1.96#,„. These are 1.157 + 1.96 X 
.083 = 1.320 and 1.157 — 1.90 X .083 =■ .994. These two c/s may now 
be converted back to r's, where z r = I 320, r - .807 and where z r - .991, 
r = .759. Thus we may assert with 95 per rent confidence that the popu- 
lation value of the correlation coefficient falls within the limits .759 and 
.807. In practice we are infrequently concerned with fixing confidence 
intervals for correlation coefficients. The most frequently occurring 
problems are testing the significance of a correlation coefficient from zero 
and testing the significance of the difference between two correlation 
coefficients. 


12.7 

Significance of a correlation coefficient 

Testing the significance of the correlation between a set of paired observa- 
tions is a frequent problem in psychological research. We begin by 
assuming the null hypothesis that the value of the correlation coefficient is 
equal to zero, or // 0 :p — 0. A test of significance may then be applied 



sec. 12.8 


Significance of difference between correlation coefficients 187 


using the distribution of /. The t value required is given by the formula 

, In - 2 

1 * r V 1 —i 1210 

The number of degrees of freedom associated with this value of t is N — 2. 
The loss of 2 degrees of freedom results because testing the significance of 
r from zero is equivalent to testing the significance of the slope of a regres- 
sion line from zero. The reader will recall that the correlation eoeffieient 
is the slope of a regression line m standard-score form. The number of 
degrees of freedom associated with the variability about a straight line 
fitted to a set of points is two less than the number of observations. A 
straight line will always fit two points exactly, and no freedom to vary is 
possible. With three points there is 1 degree of freedom, with four point* 
2 degrees of freedom, and so on. 

Consider an example where r =■ .50 and N — 20 We obtain 



The <if ~ 20 — 2 - 18. Referring to the table of 'fable B in the 
Appendix, we find that for this rff a t of 2.10 is required for significance at 
the 5 per cent level and a t of 2.88 at the 1 p'*r cent level. The sample 
value of r falls between these two values. It may be sa ; d to be significant 
at the 0 per cent level. 

Table V of the Appendix presents a tabulation of the values of r 
required for significance at different levels. We note that where the 
number of degrees of freedom is small, a large value of r is required for 
significance. For example, where df — 5, a value of r > .7 >t is required 
before we can argue at the 5 per cent level that the r is significant. Even 
for df = 20, a value of r > .423 is required for significance at the 5 per 
cent level. This means thac little importance can he attached to correla- 
tion coefficients calculated on small samples unless these coefficients are 
fairly substantial in size. 

12.8 

Significance of the difference between two 
correlation coefficients for independent samples 

Consider a situation where two correlation coefficients, r 1 and r 2 , are 
obtained on two independent samples. The correlation coefficients may, 
for exau pie, be correlations between intelligence- test scores and mathe- 
matics-examination marks for two different freshman classes. We wish 
to test whether ri is significantly different from r 2 , that is, whether the 



1 88 


Tests of significance: other statistics 


chap. 12 


two samples can be considered random samples from a common popula- 
tion. The null hypothesis is i/ 0 :pi = p 2 or H 0 :pi — pi = 0. 

The significance of the difference between r x and r 2 can be readily 
tested using Fisher’s z T transformation. Convert r x and r 2 to z T \s using 
Table E of the Appendix. As stated previously, the sampling distribu- 
tion of z r is approximately normal with a standard error given by 

= l/y/N~—~3 

The standard error of the difference between two values of z r is given by 

"1“ == ^ ^ g 3 12. II 

By dividing the difference between the two values of z r by the standard 
error of the difference, we obtain the ratio 


Zrl Zr2 

Vl/(AT» - 3) + 1/W, - 3) 


12.12 


This is a unit-nonnal-curve deviate and may be so interpreted. Values of 
1.96 and 2.58 are required for significance at the 1 per cent and 5 per cent 
levels. 

To illustrate, let the correlations between intelligence scores and 
mathematics-examination marks for two freshman classes be .320 and 
.720. Let the number of students in the first class be 53 and in the second 
23. Are the two coefficients significantly different 9 The corresponding 
z r values obtained from Table E of the Appendix are .332 and .908. The 
required normal deviate is 


.908 - .332 

VT/(53 - 3)~+ 1/(23 - 3) 


2.18 


The difference between the two correlations is significant at the 5 per cent 
level. 

The application of a test of significance in a situation of this kind is 
simple. The interpretation of what the difference in correlation means 
may be difficult. 


12.9 

Significance of the difference between two 
correlation coefficients for correlated samples 

Consider a situation where three measurements have been made on the 
same sample of individuals. Three correlation coefficients result, r«, ru, 
and r*». If we wish to compare r« and ru, or r l2 and r*», or n* and r«, the 



Exercises for chapter za 


189 


method described in the preceding section does not apply. Here the two 
coefficients under comparison are not based on independent samples but 
are based on the same sample and are correlated. 

To test the difference between rn and ru under these conditions, we 
may calculate a value t by the formula 

^ _ ( r i 2 — r 13 ) VJN — 3)(1 + r 2 3 ) ^ 

\/2(l — ri2 2 — ru 2 — t 23 2 + 2ri2ri 3 r23) 

This expression follows the distribution of t with N — 3 degrees of free- 
dom. Note that to apply this test the correlation r 2 3 is required. 

Let X 2 and X* be two psychological tests used to predict a criterion 
measure of scholastic success X\ The three correlation coefficients based 
on a sample of 100 cases are ri 2 = .60, r i3 = .50, and r 2 3 = .50. Are 
X 2 and X 3 significantly different as predictors of scholastic success? 
Is there a reasonable probability that the difference between the two 
correlations r n and r w can be explained in terms of sampling error? 
The value of t is 

, = (.60 - .50) VCIOOT 1 3)0+7.50) = j 29 

a/2(1 - .GO 2 — .50 2 - ,50 2 + 2 X .60 X .50 X .50) 

For df = 97, a t of about 1.99 is required for significance at the 5 per cent 
level. In consequence, the difference between the two correlation coef- 
ficients cannot be said to be significant. 

The above test has certain restrictive assumptions underlying its 
development and because of these is perhaps not entirely satisfactory. 
For further discussion ->ee Walker and Lev 


EXERCISES 

1 Given two random samples of size 100 with sample values pi = .80, 
p 2 = .60, N i = 60, and N 2 - 40, test the significance of the difference 
between pi and p 2 . 

2 Consider two test items A and B. In a sample of 100 people, 30 pass 
item A and fail item />, whereas 20 fail item A and pass item B. 
Are the proportions passing the two items significantly different from 
each other? 

3 In a market survey 24 out of 96 males and 63 out of 180 females 
indicate a preference for a particular brand of cigarettes. Do the 
data warrant the conclusion that a sex difference exists in brand 
preference? 



190 


Tests of significance: other statistics 


chap. 12 


4 On an attitude scale, 03 and 39 individuals from a sample of 140 
indicate agreement to items .1 and B , respectively, and 29 individuals 
indicate agreement to both items. Is there a significant difference in 
the response elicited by the two items? 

5 Given two independent samples of size 20 with .<?i 2 ■= 400 and 
s 2 2 = 023, test the hypothesis that the variances are significantly 
different from each other. 

6 Given two correlated samples of size 20 with .sr -- 400, .s 2 2 = 023, 
and r J2 == .7071, test the hypothesis t ha ^ the variances are signifi- 
cantly different from eaeh other. 

7 Calculate values of 2 r for r == .70, r - .03, r -= -.(>0, and r = —.99. 

8 Calculate, using formula 12.10, values of / for the following values of 
r and N : 

r .20 .30 .40 .30 

N 30 40 30 20 

9 The correlation between psychological-test scores and academic 
achievement for a sample of 147 freshmen is .40. The corresponding 
correlation for a sample of 123 sophomores is .39. Do these correla- 
tions differ significantly? 

10 Three psychological tests are administcied to a sample of 30 students. 
The correlations are r u — .70, r {i — .40, and r« A = .00. Is m sig- 
nificantly different from r n ? 



Chi Square 


JLv/,i Introduction 

Wo have previously discussed the application of the binomial, normal, /, 
and F distributions. Another distribution of considerable theoretical and 
practical importance is the distribution of chi square, or x 2 - In many 
experimental situations wo wish to compare observed with theoretical fre- 
quencies. The observed frequencies are those obtained empirically by 
direct observation or experiment. The theoretical frequencies are gen- 
erated on the basis of some hypothesis, or line of theoretical speculation, 
which is independent of the data at hand. The question arises as to 
w'hether the difference* between the observed and theoretical frequencies 
are significant. If they are, this constitutes evidence for the rejection of 
the hypothesis or theory that gave rise to the theoretical frequencies. 

Consider, for example, a die. We may formulate the hypothesis 
that the die is unbiased, m which case the probability of throwing any 
of the six possible values in a single toss is The frequencies expected 
on the basis of this hypothesis are the theoretical frequencies. In a series 
of .300 throws the expected or theoretical frequencies of 1, 2, 3, 4, 5, and 6 
are 50, 50, 50, 50, 50, and 50. Let us now experiment by throwing the die 
300 times. The observed frequencies of the values from 1 to 6 are 43, 55, 
39, 50, 03, and 44. May the differences between the observed and theo- 
retical frequencies he considered to result from sampling error? Are the 
differences highly improbable on the basis of the null hypothesis, thereby 
providing evidence for the rejection of the hypothesis that the die is 
unbiased? 

As a further illustration, let us formulate the hypothesis that in litters 
of rabbits the probability of any birth being either male or female is j. 
Using the binomial distribution we ascertain that the expected or theo- 
retical frequencies of 0, 1, 2, 3, 4, 5, and 6 males in 64 litters of 6 rabbits are 
1, 6, 15, 20, 15, 6, and 1. By counting the number of males in 64 litters 
of six rabbits, the corresponding observed frequencies are 0, 3, 14, 19, 20, 
6, and 2. Do the observed and theoretical frequencies differ? 



192 


Chi square 


chap. 13 


Consider another example. In a market research project two 
varieties of soap, A and B, are distributed to a random sample of 200 
housewives. After a period of use the housewives are asked which they 
prefer. The results indicate that 11.1 prefer A and 85 prefer B. The 
hypothesis may he formulated that no difference exists in consumer 
preference for the two varieties of soap; that a 50:50 split exists. Do 
the observed frequencies constitute evidence for the rejection of this 
hypothesis? 

The statistic x 2 used in situations of the type described above where 
a comparison of observed and theoretical frequencies is required. It has 
extensive application in statistical work, x 2 is defined by 


X 


2 


v (o z Ey 

L e ~ 


13.1 


where O = an observed frequency 

E - an expected or theoretical frequency 
Thus to calculate a value of x 2 we find the differences between the ob- 
served and expected values, square these, divide each difference by the 
appropriate expected value, and sum over all frequencies. 

Table 13.1 illustrates the calculation of x 2 in comparing ^hc observed 
and expected frequencies for 300 throws of a die. Note that the sum 
of both the observed and expected frequencies is equal to N ; that is, 
20 = 2 E = N. The value of x 2 obtained in Table 13.1 is 8.72. This is 
a measure of the discrepancy between the observed and theoretical fre- 
quencies. If the discrepancy is large, x 2 large. If the discrepancy is 


Table 13. 1 

Calculation of x 2 in comparing observed and expected frequencies 
for 300 throws of a die 


Value of 
die 
X 


1 

2 

3 

4 

5 

6 


Observed 

frequency 

0 

Expected 

frequency 

E 

O - E 

(O - E) 2 

(O - E)* 
E 

43 

50 

-7 

49 

98 

55 

50 

5 

25 

.50 

39 

50 

-11 

121 

2.42 

56 

50 

6 

36 

.72 

63 

50 

13 

169 

3.38 

44 

50 

-6 

36 

.72 

300 

300 

0 


x* - 8.72 


Total 


sec. 13.2 


The sampling distribution of chi square 


193 


small, x 2 i s small. Does a value of x 2 = 8.72 constitute evidence at an 
accepted level of significance for rejecting the null hypothesis? The 
answer to this question demands a consideration of the sampling dis- 
tribution of x 2 . 


13-2 

The sampling distribution of chi square 

The sampling distribution of x 2 may be illustrated with reference to the 
tossing of coins. Let us assume that in tossing 100 unbiased coins 40 
heads and 54 tails result. The expected frequencies are 50 heads and 50 
tails. A value of x 2 may be calculated as follows* 


E 

S3 

1 

0 

(O - A 1 )* 

(0 - A’) 1 
K " 

50 

-4 

16 

32 

50 

-1 4 

16 

32 




x * = 64 


In the tossing of 100 coins two frequencies are obtained, one for heads 
and one for tails. These frequencies aie not independent. If the fre- 
quency of heads is 40, the frequency of tails is 100 — 40 = 54. If the 
fre uency of heads is 02, the frequency of tails is 100 — 02 = 58. Quite 
cle ly, given either frequency, the other is determined. One frequency 
onl is free to vary. In this situation 1 degree of freedom js associated 
wit the value of x‘- 

Let us toss the 100 coins a second time, a third time, and so on, to 
obtain different values of x 2 . A larg^ number of trials may be made, and 
a large number of values of x 2 obtained. The frequency distribution of 
these values is an experimental sampling distribution of x 2 for 1 degree of 
freedom. It describes the variation in x 2 with repeated sampling. By 
inspecting this experimental sampling distribution estimates may be 
made of the proportion of times, or the probability, that values of x 2 
equal to or greater than any given value will occur due to sampling fluc- 
tuation for 1 degree of freedom. In the present illustration this assumes, 
of course, that the coins are unbiased. 

Instead of tossing 100 coins, let us throw 100 unbiased dice, obtain 
observed and expected frequencies, and calculate a value of x 2 * In this 
situation if any five frequencies are known, the sixth is determined. 
Five degrees of freedom are associated with the value of x 2 obtained. 
The 100 dice may be tossed a great many times, a value of x 2 calculated 





194 


Chi square 


chap. 13 



fig. 13. 1 Clu-squaiv distribution and 5 pvr coni critical regions for various degiees 
of freedom 


for each trial, and a frequency distribution made. This frequency dis- 
tribution is an experimental sampling distribution of \~ for o degrees of 
freedom. 

The theoretical sain filing distribution of x 2 i* s known, and proba- 
bilities may be estimated from it without using tin* elaborate experimental 
approach described for illustrative purposes above. The equation for 
X 2 is complex and is not given here. It contains the number of degrees 
of freedom as a variable. This means that a different sampling distribu- 
tion of x 2 exists for each value of (if. Figure 1.T1 shows different chi- 
square distributions for different values of cif. x 2 is always positive, a 
circumstance which results from squaring the difference between the 
observed and expected values. Values of x 2 range from zero to infinity. 
The right-hand tail of the curve is asymptotic to the abscissa. For 1 
degree of freedom the curve is asymptotic to the ordinate as well as to the 
abseissa. 

The x 2 distribution is used in tests of significance in much the same 
way that the normal, /, or the F distributions are mod. The null 
hypothesis is assumed. This hypothesis states that no actual differences 
exist between the observed and expected frequencies. A value of x* i s 
calculated. If this value is equal to or greater than the critical value 
required for significance at an accepted significance level for the appropri- 



sec. 13.3 


Goodness of fit 


195 


ate df, the null hypothesis is rejected. We may state that the differences 
between the observed and expected frequencies aie significant and cannot 
reasonably be explained by sampling fluctuation. Table C in the Appen- 
dix shows values of x 2 required for significance at various probability 
levels for different values of <ff. The critical values at the 5 and 1 per cent 
levels for df --- l are, respectively, 3.81 and (>.(>4. This means that 5 and 
1 per cent of the area of the curve fall to the* right of ordinates erected at 
distances 11.8 1 and (>.(> 4 measured along the base* line from a zero origin. 
For df — 5, the corresponding 5 and 1 per cent critical \ allies an* 11.07 
and 15.09. 

Table C of the Appendix provides t he* 5 and 1 per cent critical values 
for df — 1 to (// = :*0 This covers the great majority of situations ordi- 
narily (encountered in practice. Situations where a \ 2 is calculated based 
on a df greater than 30 are infrequent. Where df is greater than ,50 the 
expression v 2x 2 “ \/ 2df -- 1 has a sampling di.d ribution which is 
approximately normal. Values of this expression reejuired for sig- 
nificance at the 5 and 1 per cent level* 4 are 1 04 and 2.33. 

Table C of the Appendix provides, in addition to critical x 2 values at 
the 5 and 1 per cent levels, values at other per cent or probability levels. 
Values are given to the right of which 99. 95, 90, and other percentages of 
the area of the curve lie. For example, for df - 5, 95 per cent of the area 
of the curve falls to the right of x 2 “ 1.14 and 99 per cent to tin* right of 
x 2 =■ .55. For df - 5, a value* of x ? 1.14 is just a* 4 improbable on the 
basis of sampling fluctuation as a value of x 2 11.07, the critical value* 
nt the 5 per cent level. Very close agreement between observed and 
expected values may be a highly improbable «*\ent. Where an improba- 
bly small value of is obtained, either the (lata 01 the calculation is sus- 
pect and should be subject to careful scrutiny. 


133 

Goodness of fit 

Numerous example* 4 may be found to illustrate the goodness of fit of a 
theoretical to an observed frequency distribution. In one experiment. 
Abbe Mendel observed the shape and color of peas in a sample of plants. 
The distribution is shown in Table 13.2 on page 190. According to his 
genetic theory the expected frequencies should follow’ the ratio 9:3:3: 1. 
The correspondence between observed and expected frequencies is close. 
The value of x 2 = .470, and no grounds exist for rejecting the null hypoth- 
esis. The data lend confirmation to the theory. The value of x 2 is 
smaller than we should ordinarily expect, the probability associated with 
it being between .90 and .95. Assuming the null hypothesis, a fit as 



196 


Chi square 


chap. 13 


good or better than the one observed may be expected to occur in between 
5 and 10 per cent of samples of the same size. 

In testing goodness of fit the hypothesis may be entertained that the 
distribution of a variable conforms to some widely known distribution 
such as the binomial or normal distribution. Johnson (1949), in order to 
illustrate the goodness of fit of the theoretical binomial distribution to an 
observed distribution, tossed 10 coins 512 times and recorded the propor- 
tion of tails. His data are shown in Table 13.3, together with the cor- 
responding theoretical binomial frequencies. The mean and standard 
deviation of the observed distribution are X = 0.5 and s = .162. The 
mean and standard deviation of the theoretical binomial are X = 0.5 
and s = .156. 

Note that in the calculation of x 2 for these data, the small frequencies 
at the tails of the distributions are combined, a procedure that is generally 
advisable with data of this type. Problems in the application of chi 
square resulting from the presence of small frequencies are discussed later 
in this chapter. With the present data, combining small frequencies 
reduces the number of frequencies from 11 to 9 and the number of degrees 
of freedom from 10 to 8. The value of x 2 for these data is 9.55. The 
value required for significance at the 5 per cent level for 8 degrees of free- 
dom is 15.51. The conclusion is that the evidence is insufficient to justify 
rejection of the null hypothesis. Reference to a table of x 2 shows that a 
value of x 2 equal to or greater than the one observed might be expected to 
occur in about 30 per cent of samples due to sampling fluctuation alone. 

Where the theoretical frequency distribution is continuous, we 
require a method for the estimation of the theoretical frequencies. In 
fitting a continuous curve we calculate the proportion of the area under 
the theoretical curve corresponding to each class interval. This propor- 


Table 13.2 

Comparison of observed and expected frequencies in shape and 
color of peas in experiment by Mendel 



0 

K 

(O-E) 

(0 - E)* 

(0 - E )* 
E 

Round yellow 

315 

312 75 

2 25 

5 0o 

.016 

Round green 

108 

104 25 

3 75 

14 06 

.135 

Angular yellow 

101 

104 25 

-3 25 

10.56 

.101 

Angular green 

32 

34 75 

-2 75 

7 56 

.218 

Total 

556 _ 

556 _ 

0 00 


X* = .470 




sec. 13.3 


Goodness of fit 


197 


tion multiplied by N is taken as the theoretical frequency within the class 
interval. This procedure is illustrated in Table 13.4, The data are 
adapted from McXemar and are Stanford-Binet IQ's, Korin M, 

for a sample of 2,970 individuals. 

We are required to calculate the theoretical normal frequencies for 
the class intervals and test the goodness of fit between the theoretical and 
the observed. The mean and standard deviation are X = 104.7)0 and 
s ■=■ 1().99. A normal frequency distribution is required with the same 
Af, A , and $ as the observed distribution. We proceed by combining the 
small frequencies at the tails of the distribution, as shown in col. 2. This 
reduces the number of frequencies from 14 to 11. The frequency of 10 
at the top of the distribution contains all cases above 1 the exact limit 
149.5. The frequency of 12 at the bottom of the distribution contains all 
cases below the exact limit 59 5. We next record the exact upper limits 
of the class intervals (col. 3), convert these to deviations from the mean of 
104.50 (col. 4), and divide by the standard deviation 10.99 to obtain the 
standard score j/.s (col. 5). Thus the exact upper limits of the class inter- 
vals are expressed in standard measure. For example, the exact upper 
limit of the interval 140 to 149 is 149.5 This as a deviation from the 
mean is 149.5 — 104.50 - 44.94. Dividing this by the standard devia- 


Table 13.3 

Goodness of fit of binomial distribution to observed distribution 
of proportion of tails from 512 tosses of 10 coins* 


Propoi tion 
of tails 


O 

K 

0 - E 

(O - E)* 
~E 

1 0 

0 9 

3 

■ 7 

ft 5K 

5 or 

1 5 

0 409 

0 8 

15 


22 5 

-7 5 

2 500 

0 7 

OS 


60 0 

8 0 

1 067 

0 6 

m. r ) 


105 0 

0 0 

0 000 

0 5 

134 


126 0 

8 0 

0.508 

0 4 

95 


105 0 

-10 0 

0 952 

0 3 

55 


60 0 

— 5 0 

0 417 

0 2 

23 


22 5 

0 5 

0 011 

0 1 

0.0 

3 

10 

!> 6 

4 5 

3.682 

Total 

512 


512 

X* 

- 9.546 



198 


Chi square 


chap. 13 


tion 16.99, we obtain 44.94/16.99 = 2.64.~>. We then nonsuit a table of 
areas under the normal curve and ascertain the proportions of the area 
under the normal curve falling below the standard-score values x/s of 
col. 5. These proportions are shown in col. 6. We observe that a pro- 
portion .9959 of the area of the normal curve falls below 2.645 standard 
deviation units above the mean, a proportion .9801 falls below 2.057 units 
above the mean, and so on. By subtraction we obtain the proportions 
of the area of the normal curve falling within the class intervals (col. 7). 
The proportion above the exact limit 149 5 is 1.0000 — .9959 - .0041. 
The proportion between 1 149.5 and 1 49.5 is .9959 — .9801 = .0158. 
The proportion between 129.5 and 159.5 is 9801 - .9289 - .0512, and 
so on. By multiplying these proportions by A r we obtain the expected 
frequencies (col. 8). 

The above method simply involves converting the exact limits of the 
class intervals to standard deviation units, Using a table of areas under 


Table 13.4 

Calculation of normal distribution frequencies for 
Stanford -Binet IQ’s, form M* 1 


I 


2 

3 

4 

5 

6 

7 

8 

Class 

interval 


O 

Upper 

limit 

1 )oviation 
from mean 

X 

J/H 

Proportion 

below 

Propmtion 

within 

Cxpected 

frequency 

160- 

3 ] 

il6 







150-159 

13 j 





0041 

12 

140-149 

55 


149 5 

44 94 

2 645 

9959 

0158 

47 

130-159 

120 


139 5 

34 94 

2 057 

9801 

0512 

152 

120-129 

330 


129 5 

34 94 

1 468 

9289 

1186 

352 

110-119 

610 


119 5 

14 94 

879 

8103 

1958 

582 

100-109 

719 


109 5 

4 94 

291 

6145 

2316 

688 

90-99 

592 


99 5 

-5.06 

- 29S 

3829 

1950 

579 

80-89 

338 


89 5 

-15 06 

- SS6 

1879 

1177 

350 

70- 79 

130 


79 5 

-25.06 

-1 475 

0702 

0506 

150 

60-69 

48 


69 5 

-25 06 

-2 064 

0196 

0155 

46 

50-59 

7) 


59 5 

-45 06 

-2 652 

0041 

0041 

12 

40-49 

4 

12 







30-39 









Total 

2,970 






” Tonoo 

~ 2/970 



sec. 13.3 


Goodness of fit 


199 


the normal curve t»o find the proportion of the area within these limits and 
multiplying these proportions by N to obtain the expected frequencies. 

Table 13.5 shows the calculation of x 2 in comparing the observed with 
expected frequencies. A value of x 2 — 17.02 is obtained. In this case 
the number of df is 11 — 3 = 8. Although there are 11 frequencies, 8 
only are free to vary. The loss of II degrees of freedom results because the 
observed and expected distributions are made to agree on N y X , and $. 
For df = 8, the value of x 2 required for significance at the 5 per cent level 
is 13.51 and at the 1 per cent level 20.09. The obtained x 2 falls between 
these two values at about the 3 per cent level. Thus the chances are 3 in 
100 that a fit as good or worse than the one observed would result in 
random sampling from a normal population. This establishes grounds 
for rejecting the hypothesis that the distribution of Stanford-Binet IQ's, 
Form M, is normal. The departures fiom normality are, however, not 
gross. 

Chi square may be used to test the representativeness of a sample 


Table 13.5 

Goodness of fit of normal frequencies to frequency distribution of 
Stanford-Binet IQ’s, form M’ 


Cla&s 


O 


O - E 

(U - E)* 

inter\ al 


h 

K 

160- 
150 150 

31 

13J 


12 

4 

1.33 

140-140 

55 


47 

8 

1 36 

130 13!) 

120 


152 

-32 

6 74 

120-12!) 

330 


352 

-22 

1 38 

110-110 

610 


582 

28 

1 35 

100 100 

710 


688 

31 

1 40 

00 00 

502 


570 

13 

29 

M) 80 

338 


350 

-12 

41 

70-70 

130 


150 

-20 

2 67 

60 69 

48 


46 

2 

09 

50-50 

71 


12 

0 

.00 

40-49 

4 

12 




30-39 

_lj 





Total 

2,970 


2,970 

0 

X> - 17 02 



200 


Chi square 


chap. 13 


where certain population values are known. This in effect is a test of 
goodness of fit. To illustrate, in a study of attitudes toward immigrants, 
a sample of 200 cases is drawn from the city of Montreal. The observed 
frequencies and percentages by racial origin are shown in Table 13.0. 

The population percentages obtained from census returns are also 
shown. These population percentages are used to obtain the expected, or 
theoretical, frequencies. The value of x 2 i s 27.41. For df = 2 this is 
highly significant, the value required for significance at the 1 per cent 
level being 9.21. We may conclude that the sample is biased and cannot 
be considered a random sample with respect to racial origin. Since 
attitudes toward immigrants may be linked to racial origin, results 
obtained on this sample may be highly questionable unless a correction is 
applied to adjust for the sample bias 

134 

Tests of independence 

A frequent application of chi square occurs where the data are comprised 
of paired observations on two nominal variables. We wish to know 
whether the variables are independent of each other or associated. To 
illustrate, Table 13.7 presents data collected by Woo (1928) on the rela- 
tionship between eyedness and handedness in a sample of 413 subjects. 
Subjects were tested for eyedness and handedness and grouped in one of 
three categories on both variables. Paired observations were available 
for each subject. One subject wa* left-handed and ambiocular, another 
right-handed and right-eyed, and so on. The paired observations were 
entered in a bivariate frequency table as shown in Table 13.7. Such 
tables are analogous to correlation tables. They are used to study the 
independence or association of the two variables. Tables of this kind 


Table 13.6 

Application of x* in comparing sample frequencies of racial origin 
with population frequencies 


Racial 

origin 

0 

Sample, 
per cent 

Population, 
per cent 

E 

O - E 

(1 0 - E )* 
E 

French 

95 

47.5 

62.5 

125 

-30 

7.20 

English 

67 

33.5 

19.4 

39 

28 

20.10 

Other 

38 

19.0 

18.1 

36 

2 

.11 


200 

100.0 

100 0 

200 

0 

x* - 27.41 






sec. 13.4 


Tests of independence 


201 


are spoken of as contingency tables. With such tables chi square provides 
an appropriate test of independence. 

In applying chi square to a contingency tabic to test independence, 
the expected cell frequencies are derived from the data. The expected 
cell frequencies are those we should expect to obtain if the two variables 
were independent of each other, given the marginal totals of the rows and 
columns. Chi square provides a measure of the discrepancy between the 
observed cell frequencies and those expected on the basis of independence. 
If the value of chi square is considered significant at some accepted level, 
usually either the 5 or the 1 per cent level, we reject the null hypothesis 
that no difference exists between the observed and expected values. We 
then accept the alternative hypothesis that the two variables are 
associated. 

How are the expected cell frequencies calculated? The marginal 
totals to the right of Table 13.7 show that 124 subjects were left-handed, 
7 5 ambidextrous, and 214 right-handed. The proportions in these three 
categories are T 7 , 5 7 , and fJ-J- These proportions are the probabilities 
that an individual, selected at random from the sample of 413 individuals, 
is left-handed, ambidextrous, or right-handed. The marginal totals at 
the bottom of Table 13.7 show that 118 subjects are left-eyed, 195 ambi- 
ocular, and 100 right-eyed. The proportions in these three categories are 


Table 13.7 

Contingency table showing relationship between eye and hand 
laterality for 413 subjects and calculation of expected values 


Loft-eyed Ambiocular 

Right-eyed 

Total 

Left-handed 

34 

62 

28 

124 


(35.4) 

(58.5) 

(30 0) 


Ambidextrous 

27 

28 

20 

75 


(21 4) 

(35.4) 

(18.2) 


Right-handed 

57 

105 

52 

214 

(61 1) 

(101 0) 

(51 8) 


Total 

118 

195 

100 

413 

Calculation of expected values: 




124 X 118 ... 

413 

124 X 195 
413 

- 58.5 

124 X 100 
413 

* 30.0 

75 X 118 . 

413 il ' 

75 X 195 
413 

= 35.4 

75 X 195 
413 

» 18.2 

214 X 118 fil t 
413 = 611 

214 X 195 
413 

= 101.0 

214 X 100 
413 

* 51.8 


202 


Chi square 


chap. 13 


irf> tts, and tttt- These are the probabilities that an individual is left- 
eyed, ambiocular, or right-eyed. Assuming the independence of the two 
variables, what are the expected probabilities associated with the joint 
events, or what is the expected proportion of left-eyed people who are 
left-handed, of left-eyed people who are ambiocular, and so on? The 
multiplication theorem of probability states that the joint occurrence of 
two or more mutually independent events is the product of their separate 
probabilities. The joint probabilities are obtained, therefore, by multi- 
plying the probabilities obtained from the marginal totals. The proba- 
bility that any individual, selected at random from the 413 individuals, is 
left-handed is The probability that any individual is left-eyed is 

tt#- If handedness and eyedness are independent, the probability that 
any individual is both left-eyed and left-handed is the product of the 
separate probabilities, or X fft- This is the expected proportion in 
the top left-hand cell in Table 13.7. We require, however, not the 
expected proportion, but the expected frequency. This is obtained by 
multiplying the expected proportion by N f in this case 413. Thus the 
expected frequency is X irf)413 = (124 X 118) /413 = 35.4. We 
observe that for computational purposes the expected cell frequency is 
obtained by multiplying together the first row and column totals and 
dividing by N . Similarly, the other expected cel] frequencies may be 
obtained. The expected frequency of left-handed ambiocular individuals 
is (124 X 195)/413 = 58.5, that of left-banded right-eyed individuals is 
(124 X 100)/413 = 30.0 and so on. The expected cell frequencies are 
shown in parentheses in Table 13.7. 

If eye and hand laterality are independent of each other, the 124 
observations in the first row of Table 13.7 will be distributed in the three 
cells in that row in a manner proportional to the column sums. The 
expected values 35.4, 58.5, and 30.0 are proportional to the column sums 
118, 195, and 100. Likewise, the 118 cases in the first column w r ill be 
distributed in the three cells in that column in a manner proportional to 
the row sums. The expected values 35.4, 21.4, and 61.1 are proportional 
to the row sums 124, 75, and 214. A similar proportionality exists 
throughout the table The expected cell frequencies in the rows and 
columns of any contingency table are proportional to the marginal totals. 

The calculation of x 2 for a contingency table is similar to that for 
tests of goodness of fit. The difference between each observed and 
expected value is squared and divided by the expected value, obtaining 
(O — E) 2 /E. These values are summed over all cells to obtain x 2 - The 
calculation is perhaps most readily accomplished by writing the data in 
columnar fashion as shown in Table 13.8. The value of x 2 obtained is 
4.021. The number of degrees of freedom associated with this value of 
X 2 is 4. The value of x 2 required for significance at the 5 per cent level 



sec. 13.4 


Tests of independence 


203 


is 9 488 We have theiefoie no grounds for rejecting the hypothesis of 
independence between eye and hand laterality Apparently there is no 
1 elation ship between these two variables 

How is the number of degrees of freedom calculated 9 I 11 testing 
independence 111 any contingency table comprised of It rows and C col- 
umns the number of degrees of freedom is given by (It - ])(C - 1) 
Lor Table Id 7 It = 3 and C — 3 The number of degrees of freedom is 
H — J)(3 1) =- 4 Tor a table comprised of two rows and two col- 

umns, refer red to as a 2 X 2, or fourfold, table, the number of degrees of 
jreedom is (2 — 1)(2 - 1) ~ l Consider for explanatory purposes the 
2X2 table 


1 , 1 - 


bO 

40 


100 

Given the restrictions of tin* marginal totals, if one cell value is known, the 
remaining thiec \ allies are del ei mined Thus if we know that the value 
in the top left rell is 25, the top light cell must be GO - 2.5 - 35, the bot- 
tom left 30 - 23 - 5, and the bottom right 40 - 5 =- 3") It one cell 
value is known, no freedom of variation lemains One degree of freedom 
only is associated with the variation in the data Similarly, in Table 1 3 7 
only four cell values aio tree to vary Given the marginal toUl^ and four 
eell values, the remaining (ell values are determined 


£1 

25 

25 

B 

5 

35 


*0 70 


Table 13 8 

Calculation of ^ for data of table 13 7 


O 

E 

(0 £) 

(O - 

(O - EY 

t 

34 

35 \ 

-1 4 

1 96 

055 

62 

5S 5 

3 5 

12 25 

209 

28 

10 0 

-2 0 

4 00 

133 

27 

21 4 

5 6 

31 36 

1 465 

28 

35 4 

-7 4 

54 76 

1 547 

20 

18 2 

1 8 

3 24 

178 

57 

61 1 

~4 1 

16 81 

275 

105 

101 0 

4 0 

16 00 

158 

52 

51 8 

2 

04 

001 

413 

4 1 J 8 



X* - 4 021 




204 


Chi square 


chap. 13 


A frequency occurring type of contingency table is the 2 X 2, or 
fourfold, contingency table. A x 2 test for independence can be readily 
obtained for such a table without calculating the expected values. Let us 
represent the cell and marginal frequencies by the following notation: 

A + B 
r +D 
N 

Chi square may then be calculated by the formula 

2 N(AD-BC) 2 

X (A + B) (C + D)(A + C)(B + D) 132 

Note that the term in the numerator, AD — BC, is simply the difference 
between the two cross products and the term in the denominator is the 
product of the four marginal totals. 

Consider the following 2X2 table showing the relationship between 
ratings of successful or unsuccessful on a job and pass or fail on an 
ability-test item. 

Test item 
Fail Pass 

tj£ 

S Successful 

os 
u 

•g Unsuccessful 

4.5 . 5,5 100 

Is there an association between performance on the job and performance 
on the test item? Does the item differentiate significantly between the 
successful and unsuccessful individuals? Chi square is as follows: 

2 100(20 XJ5 - 40_X 25)* 

X ' 60 X 40 X 45 X 55 

For df = 1, a x 2 = # 2, r > is significant at better than the 1 per cent level. 
The data provide fairly conclusive evidence that the test item differenti- 
ates between individuals on the basis of their job performance. 

13-5 

The application of x 2 in testing the significance 
of the difference between proportions 

In Chap. 12 procedures were described for testing the significance of the 
difference between' both independent and correlated proportions. These 



A 

i 

B 

r 

n 


.4 + r B + I) 





sec. 13.5 


Significance of difference between proportions 


205 


procedures involved dividing the difference between two proportions by 
the standard error of the difference to obtain a normal deviate which 
could be referred to a table of areas under the normal curve Because of a 
simple relationship foi 1 degree of freedom between \ l and the normal 
deviate, x 2 provides an alternative but equivalent proeeduie for testing 
the significance of the difference between proportions For 1 degree 
of freedom it may be shown that x 2 equal to the normal deviate squared 
Thus x 2 = Or^s) 2 =■ z 2 or \/x 2 = z 

We shall now consider the use of x 2 in testing the significance ot the 
difference between proportions for independent samples Let the follow- 
ing be data obtained m response to an attitude-test statement for a group 
of males and females 


Frequenc\ 


Proportion 



4gre< 

Disapr^e 



vgree 

I Jisagrec 

Males 

70 

70 

140 

Males 

oO t 

”>00 

f einales 

So | 

40 

00 

Fein pies 

333 

(>(>7 


00 

110 

200 


150 

> r >0 


The number of males and females are N\ - 140 and A T 2 = fiO, respec- 
tively The proportions of males and females indicating agieemcnt to 
the attitude statement are pi - r 7 r ° 0 .^00 and p 2 = so “ d.J.i Is 

theie a significant diffeience in the attitudes of males and females ? To 
apply the method pre\iomd\ described we calculate a proportion p based 
on a combination of data foi the two samples With the above data 

. 70+20 _ 90 _ 
f 140 + 00 200 

(f = 1 - p 1 — 430 = .330 

The required nounal deviate is then 


JP 1 — Vj 

VpqlV'Nl) J- H W t )I 

_^500 - 333 

■\/ 450 X 1 "h in 


= 2.172 


The difference between the two proportions falls between the 3 and 1 per 
cent levels Refeience to a table of areas under the normal curve shows 
that the proportion of the area falling beyond plus and minus 2 172 stand- 
ard devia hi units from the mean is ( lose to 03 The difference may be 
taken as significant at about the 3 per t cut level Let us now apply the 
formula for calculating x 2 for a 2 X 2 contingency table to the same data. 





206 


Chi square 


chap. 13 


We have 

t _ N(AD - BCy 

x (A+ B)(C + D)(A + C)(B + D) 

200(70_X 40 -_70 X 20) 2 = 

140 X 00 X 90 X 110 ' 

Consulting a table of x 2 with 1 degree of freedom, we observe that the 
proportion of the area in the tail of the distribution of x 2 is about .03 and 
the difference between proportions may be said to be significant at about 
the 3 per cent. level. We observe also that y 2 =- (2.172) 2 = 4.717. The 
two procedures for testing the significance of the difference between pro- 
portions for independent samples lead to identical results. From a 
computational viewpoint the x 2 test is the more convenient. Considera- 
tions pertaining to small frequencies apply also to the application of x 2 in 
testing the significance of the difference between proportions (Sec. 13.fi). 

Where the data are correlated and are composed of paired observa- 
tions the normal deviate for testing the significance of the differences 
between proportions is given (Sec. 12.3) by the formula 

= ip__r 4 ) 

VA + D 


where D and A are cell frequencies in the bottom right and top left cells, 
respectively, of a 2 X 2 table. Instead of calculating a critical ratio and 
referring this to the normal curve, we may calculate x 2 by the formula 


= (/>_- AY 

x AY D 


13-3 


For the data shown in Sec. 12.3, where we wish to test the significance of 
the differences between proportions of agreements to an attitude question 
for the same individuals tested on two occasions, we obtain a z ■= 3.1 fi. 
The difference is significant at better than the 1 per cent level. The value 
of the probability is .OOlfi. The value of x 2 calculated on the same data is 
(3.16) 2 , or 9.986. The probability is the same as before. 


13-6 

Small expected frequencies 

The distribution of x 2 used in determining critical significance values is a 
continuous theoretical frequency curve. Where the expected frequencies 
are small, the actual sampling distribution of x 2 may exhibit marked dis- 
continuity. The continuous curve may provide a poor fit to the data, and 
appreciable error may occur in the estimation of probabilities, these being 
areas under the continuous x 2 curve. The situation here is analogous to 



sec. 13.6 


Small expected frequencies 


207 


that found in using the normal curve as a fit to the binomial. For small 
values of N the continuous normal curve is a poor fit to the discrete 
binomial. 

For 1 degree of freedom a correction may be applied known as Yates* 8 
correction for continuity. To apply this correction we reduce by .5 the 
obtained frequencies that are greater than expectation and increase by .5 
the obtained frequencies that are less than expectation. This brings the 
observed and expected values closer together and decreases the value of 
X 2 . This correction should be used where any of the expected frequencies 
is less than 5, and some writers suggest 10. For large expected fre- 
quencies the correction will be negligible. 

The formula used in computing x 2 from a 2 X 2 table can be written 
to incorporate Yates's correction for continuity. This formula becomes 

2 _N(\AD - BC\ - N/_ 2; 2 

x (.-1 + B)( C + Dj(A + C)(B f D) 134 

The term \AD - BC | is the absolute difference, that is, the difference 
taken regardless of sign. The correction amounts to subtracting N / 2 
from this absolute difference. 

The following data show tin* relationship between sociometric choices 
for a group of 20 Protestant and Jewish school children. 


Chosen 

Protestant Jewish 


Jewish 

3 

5 

Protestant 

10 

2 


13 

7 


The value of x 2 using Yate.- ’s correction is 


20(| 6 - 50| -- V) 1 
8 X 12 X 13 X 7 


2.65 


This value falls at about the 10 per cent level. The evidence does not 
justify the rejection of the hypothesis that sociometric choice is independ- 
ent of whether the child is U wish or Protestant. We note that if x 2 is 
calculated on these data without Yates’s correction, a value x 2 = 4.43 is 
obtained. This value falls at about the 3.5 per cent level. If Yates's 
correction had not been used, the result would be considered significant 
at better than the 5 per cent level. 

With 2 or more degrees of freedom the error introduced by small 
expected frequencies is of less consequence than with 1 degree of freedom. 
An expectation of not less than 2 in each cell will permit the estimation 
of roughly approximate probabilities. If the frequencies are 5 or more, 




208 


Chi square 


chap. 13 


good approximations to the exact probabilities are obtained. With 
certain types of data it is a common practice to combine frequencies. In 
testing the goodness of fit of a theoretical to an observed frequency dis- 
tribution, small frequencies at the tails may be combined. On occasion 
it may be possible without serious distortion of the data to combine rows 
and columns of a contingency table to increase the expected cell 
frequencies. 

With 1 degree of freedom where the expected frequencies are small, 
an exact test of significance may be applied. This involves the deter- 
mination of exact probabilities, as distinct from those estimated from the 
continuous x 2 curve. An exact test of significance for a 2 X 2 table is 
described below. 


13-7 

Exact test of significance for a 2 X 2 table 

An exact test of significance for a 2 X 2 table has been developed by 
R. A. Fisher. This test enables the calculation of exact probabilities and 
avoids the use of the continuous chi-square distribution to obtain approxi- 
mate probabilities. It may be used appropriately where the expected cell 
frequencies are small. The principal objection to its use is the laborious 
calculation required. 

In tossing a number of coins a finite number of events may result. 
In tossing six coins, seven outcomes are possible. We may toss 0, 1, 2, 3, 
4, 5, or 6 heads. The binomial distribution may be used to determine 
the exact probabilities associated with these seven outcomes. Similarly, 
for any 2X2 table, given the restrictions imposed by the marginal totals, 
a finite number of arrangements of the cell frequencies may result. l r or 
example, for the table 


2 

1 

3 

5 

5 

6 


only four arrangements of the cell frequencies are possible. These are as 
follows : 









sec. 13.7 


Exact test of significance for a 2 X 2 table 


20Q 


The exact probability associated with each arrangement may be cal- 
culated. To conceptualize the situation here, consider an urn containing 
three black and eight white balls. Withdraw the balls one at a time and 
assign five of them at random to a black box and six to a white box. 
Count the number of black balls in the black box. Repeat the experi- 
ment many times and calculate the relative frequencies of the four pos- 
sible outcomes. These relative frequencies are experimentally deter- 
mined estimates of the probabilities of occurrence of the four po^ible 
2X2 tables. The required probabilities may be calculated without this 
laborious experimental procedure. The probability of any arrangement 
of cell frequencies, given the marginal restrictions, is obtained bv 

_ (A + /*)!((' + />)!(/l + (')!(« 4 0)1 
P N\A\B\C\D\ 13-5 

The numerator is the product of the factorials of the marginal totals. 
The denominator is N\ times the product of the factorials of the cell 
frequencies. The factorial of any number, say, 5, is 

3 X 4 X 3 X 2 X 1 - 120 

also 0! = 1. The probabilities associated with the four tables above are 

3!8 !:>!(>! 1 

1 Pl " ll!0!3!.>!3! ’ 33 

_ _3!8 !/>!<>! _ 1 :> 

2 Pi ~ 1 1 !1 !2!4!4! ~ 33 

_3!8!5!G! = 12 

3 P% 11!2!1'3!5! 33 

3!8!3!G! 2 

4 P * " 11 !3!0!2!G! 33 

Total 

Clearly, in this case we have no grounds for rejecting the hypothesis 
that the two variables are independent. The probability of obtaining a 
degree of association equal to or better than the one observed, and m the 
same direction, is obtained by summing the probabilities of arrangements 
3 and 4. This probability i- .3636 + .0006 - .4242. Thus in about 42 
samples in 100, a result equai to or better than the one observed would 
occur by chance. With the present data no arrangement of the 2 X 2 
table can lead to a statistically significant result. 

Usually the probabilities associated with all possible arrangements of 
the 2 X 2 table need not be calculated. We need only calculate the 
probabilities associated with the observed table and those that represent 
more extreme departures from expectation in the same direction. Let 


- .1212 

- .4345 
= .3636 

- .0606 
- .9000 



210 


Chi square 


chap. 13 


Table 1 below represent the observed data. Tables 2 and 3 are the two 
more extreme tables in the same direction. 


1 2 3 


4 

2 

6 

5 

1 

6 

6 

0 

3 

5 

H 

2 

6 

8 

1 

7 

7 

7 

14 

7 

7 

14 

7 

7 


The probabilities associated with these three tables are .2448, .0490, 
and .0023. The sum of these probabilities is .2961. This falls far short of 
significance, and we conclude that the evidence is insufficient to warrant 
rejection of the hypothesis of independence. The sum of the probabilities 
associated with Tables 2 and 3 is .0513. Thus the arrangement of table 2 
above, if it did occur, would fall short of significance at the 5 per cent level 
for a one-tailed test. The only arrangement of the three shown which 
could lead to a conclusion of significance, given the marginal restrictions, 
is that shown in Table 3. 

Tables to assist the application of exact tests of significance to 2 X 2 
tables have been prepared by Finney (1948). An adaptation of Finney’s 
tables is given by Siegel (1956). 

13-8 

Miscellaneous observations on chi square 

In this section w r e shall consider a number of miscellaneous points about 
X 2 not hitherto discussed. 

One-tailed and two-tailed tests Tables of x 2 used for tests of signifi- 
cance are based on one tail only, the tail to the right, of the sampling 
distribution of x 2 - Table C of the Appendix shows that for 1 degree of 
freedom 5 per cent of the area of the distribution falls to the right of 
X 2 — 3.84 and 1 per cent to the right of x 2 = 6.64. These arc not critical 
values for directional, or one-tailed, tests as described in Chap. 11. 
Although one tail only of the sampling distribution of x 2 is used, the 
tabled values are those required for testing the significance of a differ- 
ence regardless of direction, that is, for two-tailed tests. The critical 
ratio or normal deviate required for significance at the 5 ptr cent level for 
a two-tailed test is 1.96. If this value is squared, we obtain 3.84, the x 2 
value at the 5 per cent level for 1 degree of freedom. For 1 degree of 
freedom the square root of x 2 is a normal deviate and may be used with 
reference to the normal curve in applying two-tailed tests. In effect, 






sec. 13.8 


Miscellaneous observations on chi square 


211 


because x 2 is the square of the normal deviate for 1 degree of freedom, 
both tails of the normal curve are incorporated in the right tail of the x 2 
curve. In many situations where x 2 is applied, the idea of a directional, 
or one-tailed, test has little meaning. In tests of goodness of fit and in 
most tests of independence we are usually not concerned with the direc- 
tion of the difference observed. If a one-tailed test is required, the pro- 
portionate areas in the chi-square tables should be halved. The value of 
x 2 required for significance at the 5 per cent level for a one-tailed test is 
2.71 for df = 1. The corresponding value at the 1 per cent level is 5.41. 
These are the squares of the normal deviates 1.04 and 2.33 required for 
significance for a one-tailed test at the 5 and 1 per cent levels, respectively. 


Chi square and sample size The value of x 2 is related to the size of the 
sample. If an actual difference exists between observed and expected 
values, this difference will tend to increase as samj le mzc increases, x 2 
will also increase, and the associated probability value will decrease. 
Consider the following tables : 


1 2 3 



10 15 ‘25 20 30 50 40 60 100 

= 2.7S x 2 — 5 56 x* - 11.12 


As the samples are doubled in size from 25 to 50 to 100, the differences 
between the observed and expected values, 0 — E } are doubled and the x 2 
values are doubled. If no actual difference exists between observed and 
expected values, x 2 will tend fo remain unchanged as sample size increases. 
For a constant difference between observed and expected values x 2 will 
decrease as sample size inei eases. Tf w r e double sample size and hold the 
difference between observed and expected values fixed, the value of x 2 will 
be reduced by one-half. 


Alternative formula for chi square We can readily demonstrate that 


(o - uy 

E 


This alternative way of writing x 2 i» s sometimes useful for computational 
purposes. 


2 X 2 tables with more than 1 degree of freedom For most 2X2 
tables the row and column totals are considered fixed and 1 degree of 






212 


Chi square 


chap. 13 


freedom is associated with the variation in the data. Situations arise 
where either the row or column totals, or both, are free to vary. In a 
soeiometric study on a class of 8 Jewish and 12 Protestant children, each 
child may be asked to choose one other child with whom he would prefer 
to play. He cannot choose himself. If choices are independent of 
whether a child is Jew ish or Protestant, w hat are the expected frequencies 
of choices? On a strictly random basis, how many Jewish choosers will 
make Protestant choices, and so on ? Since a child cannot choose himself, 
a Jewish child chooses from among seven Jewish and 12 Protestant 
children. The probability of a Jewish child choosing a Jewish child in 
making a choice at random is 1 7 ff and of choosing a Protestant child {£. 
Since eight choices are made, the expected frequency of Jewish choices 
is ns X 8 — 2.9f>. The expected frequency of Protestant choices is 
X 8 = 0.05. Likewise, we lind that the expected frequency of Jewish 
choices by Protestant children is X 12 = 5.05, and the expected fre- 
quency of Protestant ehoiees is X 12 — 0.95. The expected fre- 
quencies are tabulated below 7 together with the observed frequencies. 


Expected 
chosen 
P J 


Obset v ed 
chosen 
I> •./ 


J 

5 0T» 

2 <15 

H 1 J 

<) 

1 


- 

— 

§ 


1 

p 

0 ar> 

5 or> 

12 O 1' 

s 

4 


12 8 20 14 (i 20 


In this example the row totals are fixed. The column totals are free 
to vary from expectation. In these data we observe a tendency for both 
Jewish and Protestant children to choose Protestant children more fre- 
quently than expectation. In this case a x 2 based on a comparison of the 
expected and observed cell frequencies has 2 degrees of freedom, x 2 may 
of course be applied to the observed frequencies in the usual way with 
1 degree of freedom. This is a test of association, within the restrictions 
of the marginal totals, of the religious affiliation of the choosers and the 
chosen. It is not a test of randomness of choice. Fourfold tables may 
occur where both row 7 and column totals are free to vary. Such tables 
arise where all expected frequencies are derived in a manner entirely inde- 
pendent of the data, x 2 here has 3 degrees of freedom. 


Reduction of an R X C table to a 2 X 2 table A table with R rows 
and C columns may be reduced to a 2 X 2 table in order to facilitate a 
rapid test of association with x 2 . This procedure is legitimate enough 





Exercises for chapter 13 


213 


provided the points of dichotomy of the two variables arc made without 
reference to the cell frequencies. The investigator may decide a priori to 
dichotomize about the two medians, or something of the sort. Data are 
found where the points of dichotomy have been located in order to maxi- 
mize the association in the data and obtain thereby a significant x 2 - 
This practice is spurious and should be enthusiastically discouraged. 


EXERCISES 


1 In 180 throws of a die the observed frequencies of the values from 1 to 0 
are 34, 27, 11, 23, 18, and 3.7 Test the hypothesis that the die is 
unbiased. 


2 A psychological test yields a distribution of scores as follows. 


Class 

interval 

F requency 

90 99 

1 

80- 89 

.7 

70 79 

17 

00 09 

30 

:>o ;>9 

.70 

40-49 

3 .7 

30 39 

10 

20-29 

0 

10-19 

4 

0 9 

2 


100 


Obtain the theoretical no. null frequencies r ^est the goodness of fit 
between the theoretical and the observed frequencies. 


3 How many cell frequencies are free to vary in tables with (a) two rows 
and two columns, (b) two rows and three columns, (r) three rows and 
live columns? Assume fixed marginal totals. 


4 


The following data relate to patients in a mental hospital: 


Therapy A 
Therapy B 


Rating 

Improvement No improvement 


16 

28 

9 

37 


44 

46 


25 


65 


90 




214 


Chi square 


chap. 13 


Test the hypothesis that method of therapy is independent of rating 
assigned. 

5 The following contingency table describes the relation between scores 
above and below the median on an examination and ratings of job 
performance for 100 employees. 



Below 

average 

Rating 

Average 

Above 

average 

Above median 

11 

25 

35 

Below median 

15 

7 

7 


26 

32 

42 


71 
29 
100 

Test the hypothesis that job performance is independent of examination 
results. 

6 A sample used in a market survey contains 100 males and 100 females. 
Of the males 33 and of the females 18 state a preference for brand A. 
Use x 2 to test the hypothesis that no sex differences exist in consumer 
preference. 

7 Calculate x 2 for the following tables, using Yates's correction for 
continuity : 

a Weight 


Gentled animals 
Ungentled animals 


Impairment in 
performance 

No impairment in 
performance 


Increase 

No increase 

5 

2 


3 

5 


8 

7 


Locus of lesion 


A 

B 

r 


2 

5 

4 

11 

3 

2 

2 

«' 

5 

7 

2 

18 


7 

H 

15 


8 Obtain the exact probabilities associated with all possible arrange- 
ments of cell frequencies for the following 2X2 tables: 






Exercises for chapter 13 


215 


a b 


3 

1 

4 

1 

4 

2 

4 

(> 


3 

5 

5 

10 

h 

1 


In either case would any airangcmcnt of cell frequencies justify a rejec- 
tion of the hypothesis of independence? 





Rank Correlation 

Methods 



.1 Introduction 


Ordinal, or rank-order, data may arise in a number of different ways. 
Quantitative measurements may be available, but ranks may be sub- 
stituted to reduce arithmetical labor or to make some desired form of 
calculation possible. For example, measurements of height and weight 
may be obtained for a group of school children. A correlation between 
the paired measurements could readily be calculated. The investigator 
may, however, choose to substitute ranks for the measurements and 
calculate a correlation between the paired ranks. In many situations 
where ranking methods are used, quantitative measurements arc not 
available. The measuring operations used may be such tfiat no compara- 
tive statements about the intervals between members are possible. For 
example, employees may be rank-ordered by supervisors on job per- 
formance. School children may bo ranked by teachers on social adjust- 
ment. Whiskies may be rank-ordered by experienced judges on taste, 
or participants in a beauty contest may be rank-ordered by judge's on 
pulchritude. In such cases the data are comprised of sets of ordinal 
numbers, 1st, 2d, .‘Id, . . . , Nth. These are replaced by the cardinal 
numbers 1, 2, 3, . . . , N, for purposes of calculation. The substitution 
of cardinal numbers for ordinal numbers always assumes equality of 
intervals. The difference between the 1st and 2d member is assumed 
equal to the difference between the 2d and 3d, and so on. This assump- 
tion underlies all coefficients of rank correlation. Because of difficulties 
associated with the measurement of psychological variables, statistical 
methods for handling rank-order data are of particular interest to 
psychologists. 


14.2 

Spearman’s coefficient of rank correlation p 

Consider a group of N individuals, A 1 , A 2 , A 9 , . . . , A#, ranked on tw r o 
variables A r and Y. The rankings on X may be denoted as Xi, X 2 , 



sec. 14.2 


Spearman’s coefficient of rank correlation p 


217 


A 7 3 , • • • , Xn and on Y as Y 2y T 3 , . . . , Yn. A group of five 
individuals, for example, may be ranked 1, 2, 3, 4, 5 on raee prejudieeand 
3, 1, 2, :>, 4 on authoritarianism. The data are comprised of paired 
integers extending from 1 to N . How may a coefficient of correlation 
between the ranks be defined? 

This problem may be approached by considering the sum of squares 
of the difference* between the paired ranks. Denote this quantity by id 2 . 
As in ntany similar situations, we use the sum of squares instead of the 
sum. The sum is equal to zero. What are the minimum and maximum 
values of id 2 ? When the members are ranked in the same order on both 
A 7 and >\ the case of perfect positive correlation, id 2 = 0 and is a mini- 
mum. Thus if the ranks on X are 1, 2, 3, 4, 5 and on Y 1, 2, 3, 4, 5, the 
difference* are all zero. If the paired ranks are in inverse order, the case 
of perfect negative correlation, id 2 is a maximum. No arrangement of 
X with respect to 1 will produce a larger value of id 2 . Thus if the ranks 
on X are 1, 2, 3, 4, ."> and on )' f>, 4, 3, 2, 1, the differences d are —4, — 2, 0, 
2, 4. The squares d 2 are 1(», 4, 0, 4, l(i and id 2 = 40. It may be shown 
that the maximum value of id 2 i* given by 


id tI 


1 ) 

3" 


It may also be shown that the value of id 2 expected when ranks on X are 
independent of ranks on Y is one-half id ma r 2 , or 


J?(2d 2 ) 


N(N 2 -_1) 

0 


Coefficients of correlation are conventionally defined to take the 
values -|- 1, 0, and - 1 in the presence of a perfect positive, independent, 
and perfect negative relation, respectively, between the two variables. 
In the present case a measuie of rank-order corr elation which will meet 
this requirement may be defined as 

22d 2 

P * V,/ 2 

max 


where p is the Greek lette. rho. For a perfect positive coirelation 
Xd 2 = 0 and p = 1. For a perfect negative relation 2 d 2 = and 

p = — 1. In the case of independence, 2 2d 2 - 2rfm.it 2 and p = 0. By 
substituting the value of 2rfm»x 2 m the above formula, we obtain 

p ~ 1 " JV(A r! -1) 

This is Spearman’s coefhciont to rank correlation. 


14. i 



2l8 


Rank correlation methods 


chap. 14 


In Chap. 7 we presented the formula for Pearson's product-moment 
correlation coefficient, 

Z(Z - XKY_- Y) 

(N — 1 )s x s v 

Spearman's p is a particular case of the above formulation. It is the 
particular case which arises where the variables are the first N consecutive 
Untied integers. If the above formula is applied directly to paired ranks, 
the result is identical with that obtained by applying the formula for p. 

The calculation of p i« illustrated in lable 14.1. The calculation is 
simple. We find the differences between the paired ranks, square these, 
sum to obtain 2tf 2 , and then apply the formula for p. 

14-3 

Spearman’s p with tied ranks 

In arranging the members of a group in order, a judge may be unable to 
discriminate between certain members. Where measurements are 
replaced by ranks, certain measurement^ may be equal. These circum- 
stances give rise to tied ranks. If we attempt to replace the numbers 
14, 19, 19, 22, 23, 23, 23, 23 by ranks, we observe immediately that 19 
occurs twice and 23 three times. Under these circumstances we assign to 


Table 14. 1 

Calculation of Spearman’s coefficient of rank correlation 


Individual 

Kank 


Difference 



X 

Y 

d 

d* 

A l 

1 

6 

-5 

25 

A 2 

2 

3 

-J 

1 

Ai 

3 

7 

-4 

16 

A t 

4 

2 

2 

4 

A t 

5 

1 

4 

16 

A, 


8 

-2 

4 

A, 

7 

4 

3 

9 

A 

8 

9 

-1 

1 

A 9 

9 

5 

4 

16 

A 10 

10 

10 

0 

0 

Total 



0 

2d* = 92 


0 _ 1 6 X 92 __ 

p 1 10(100 - 1 ) A z 


sec. 14*4 


Testing the significance of Spearman’s p 


219 


each member the average rank which the tied observations occupy. 
Thus 14 is ranked 1, the two 19\s are ranked 2.5 and 2.5, the 22 is ranked 
4, the three 23's are ranked 0, G, and 6, and 25 is ranked 8. Having 
replaced the tied ranks by their average rank, we proceed as before in the 
calculation of p. A calculation with tied ranks is illustrated in Table 14.2. 
If the ties are numerous, this type of adjustment for tied ranks may not 
prove altogether satisfactory. 

The development of p from the ordinary product-moment r assumes 
that the ranks are the first N integers. Where tied ranks occur this is not 
so. Where a substantial number of tied ranks is found, the departure of 
the sum of squares of ranks from the sum of squares of the first N integers 
will be appreciable and the value of p will be thereby affected. While 
other procedures for correcting for tics may be used, one convenient 
approach is to calculate an ordinary product-moment correlation for the 
paired observations where average ranks have been substituted for ties. 

14.4 

Testing the significance of Spearman’s p 

The study of the sampling distribution of p is approached by considering 
all possible, and equally probable, arrangements of rankings on V for a 


Table 14.2 

Calculation of Spearman’s coefficient of rank correlation 
with tied ranks 


Individual 

Hank 


Difference 







X 

Y 

d 

d* 

Ai 

1 

8 

-7 

49 00 

A* 

2 5 

6 5 

-4 

16 00 

A, 

2.5 

4 5 

-2 

4 00 

A , 

4 5 

2 

2 5 

6.25 

A , 

4 5 

1 

3 5 

12 25 

A, 

6 

3 

3 

9.00 

A, 

8 

4 5 

3 5 

12 25 

A , 

8 

6 5 

1.5 

2 25 

A, 

8 

9 

-1 

1.00 

A 10 

10 

10 

0 

.00 

Total 



2d* - 

112.00 


- 6 X 112 

p " 1 10(100 - 1 ) 


= .321 



220 


Rank correlation methods 


chap. 14 


fixed ranking on X. The model is one where ranks on Y are drawn at 
random from a hat and paired successively against fixed ranks on X. 
For N — 2, if X has the ranks 1, 2, only two arrangements of Y are pos- 
sible, 1, 2 and 2, 1. Only two values of p are possible, +1 and — 1. For 
N = 3, if X has the ranks 1, 2, 3, there are six possible arrangements of 
Y and, as it turns out, four possible values of p, —1, — + J, and +1. 

The sampling distribution of p has been studied by Kendall (1943). For 
small values of N the sampling distribution of p is bimodal. F or N = 7 or 
N = 8, the distribution has a somewhat jagged or serrated appearance. 
As N increases in size, the distribution seems to approach the normal 
form. 

Table G of the Appendix shows critical values of p for different values 
of N required for significance at various levels. Observe that for a small 
N , values of p of very substantial size must be obtained before we have 
adequate grounds for rejecting the hypothesis that no association exists 
between the rankings. For N = 10 we require a p equal to or greater 
than .564 before we can argue that a significant association exists in a 
positive direction at the 5 per cent level. 

With N = 10 or greater we may test the significance of p by using a t 
given by 



This quantity has a t distribr cion with N — 2 degrees of freedom. For 
example, where N = 10 and p = .564, t = 1.93. For 8 degrees of free- 
dom the value of / at the .10 level is 1.80. For a two-tailed test we have 
insufficient grounds for arguing that the observed p is significantly dif- 
ferent from zero. For a one-tailed test the observed p is significant at 
about the 5 per cent level. 

14-5 

Kendall’s coefficient of rank correlation r 

An alternative form of rank correlation r, or tau, has been developed by 
Kendall (1943, 1956). Both Spearman’s and Kendall’s coefficients 
apply to the same type of data. In the definition of Kendall’s tau use is 
made of a statistic S which is descriptive of the disarray in a set of ranks. 
Consider the following paired ranks: 

X 1 2 3 4 5 

7 1 4 3 5 2 

The X ranks are in their natural order; the Y ranks exhibit a degree of 
disarray. To calculate S, we compare each rank on Y with every other 



sec. 14*6 


Kendall’s r with tied Tanks 


221 


rank, there being N(N — l)/2 such comparisons for N ranks. If a pair 
is ranked in its natural order, say 1 and 4, a weight +1 is assigned. If a 
pair is ranked in an inverse order, say 4 and 3, a weight - 1 is assigned. 
The statistic & is the sum of such weights over all N(N — l)/2 such com- 
parisions. In the above example the weights are +1, +1, +1, +1, - 1, 
+ 1, — 1, +1, — 1, — 1, and iS = 2. A positive value of S means that the 
1" ranks shows a tendency to increase with increase m the X rmks; a nega- 
tive S means that the Y ranks show a tendency to decrease with increase 
in X. 

The maximum possible value of S is N(N — l)/2, which is the 
number of comparisons. Kendall’s coefficient of rank correlation, r, is 
defined as the obtained value of N divided by its maximum possible 
value; that is, 

S 

T ~ },N(N - 1) 14-3 

The statistic r has a value - 1 when the paired ranks are in an inverse 
order, and a value +1 when the paired ranks are in the same order. In 
the above example S = 2, A r = 5, and r — — 2/10 — —.20. 


14.6 

Kendall’s r with tied ranks 

If ties occur, the convention is adopted, as m the calculation of Spear- 
man's p, of replacing the tied values by the average rank. A comparison 
of two tied values on Y re< eives a weight of zero, if ties occur on A", a 
comparison of the corresponding paired Y values will also receive a weight 
of zero, regardless of whether the pain'd Y values are tied. Consider the 
following example with no tie on A r and one tied pair on 1 . 

X 1 2 3 4 5 H 

y 2 3 4.5 4.5 1 fi 

On comparing each rank on Y with every other rank, and assigning a -t 1 
for a pair in their natural order, a — 1 for a pair in an inverse order, and a 
0 for a tie, we obtain +1, + 1 , +-1, — 1, 4 1, 4-1, 4-1, — 1, 4-1, 0,-1, 
41, +l f 4-1, and N = 5. Consider another example with tied 

values on both A" and 1\ 

A r 1.5 1.5 3 5 5 5 

Y 2 3 4.5 4.5 1 6 

Here the comparison on Y of 2 with 3 receives a weight of zero, because 
the order of the first tw r o paired values on X is arbitrary. Similarly, 
comparisons on l r involving the last three values will receive weights of 



222 


Rank correlation methods 


chap. 14 


zero, because of the triplet of ties on A r . In the above example the 
weights are 0, +1, +1, -1, +1, +1, +1, -1, +1,0, -1, +1,0, 0,0, 
and S = 4. 

To calculate tau with tied ranks, S is calculated in the manner 
described above, and the following formula applied: 

S_ 

ViCnW- l) - tji]n(n - i)- z u,} 14-4 

m r 

In this formula T x = | £ t (t — 1) and U v = % £ u (u — 1). One rank- 
ing contains m sets of t ties; the other ranking, r sets of u ties. To illus- 
trate, consider the example immediately above with ties on both X and Y. 
Here T x = £[2(2 - 1) + 3(3 - 1 Jj - 4; also 

U, = il 2(2 - 1)] - 1 

In this example N = 6, and tau is as follows: 

4 =31 

Vli X 0(6 - 1) - 4]|J X 6(0 - l) # ~ lj 


14.7 

The significance of S and r 

The sampling distribution of S is obtained by considering the N factorial 
arrangements of 1' in relation to A\ A value of S may be determined for 
each of the N arrangements. The distribution of these N factorial values 
is the sampling distribution of S , This distribution is symmetrical. 
Frequencies taper off systematically from the maximum value toward the 
tails. The distribution of S rapidly approaches the normal form. For 
N > 10 the normal approximation to the exact distribution is very close. 
The exact sampling distributions of S for N = 4 to N = 10 are given by 
Kendall (1955). 

In testing the significance of the association between paired ranks, 
it is more convenient to apply a test directly to S rather than to r. The 
variance of the sampling distribution of S without ties is given by 

2 N(N — l)(2N + 5) 

' 18 4 5 

If the normal approximation to the exact sampling distribution of S is 
used, a correction for continuity should be applied. This is done by 
subtracting unity from the absolute value of S. To apply a significance 



sec. 14.7 


The significance of 5 and r 


223 


test, we divide S, corrected for continuity, bv the standard deviation of 
the sampling distribution to obtain the normal deviate 2 , as follows: 

\S\ — l 

2 VN(N - ])(2iV + 5)/ 18 * 4 ' 

As usual, 1.96 and 2.58 are required for significance at the .05 and .01 
levels, respectively, for a nondirectiOnal test. To illustrate consider 
the paired ranks: 

*12 3 

Y 2 4 3 

Here the weights are +1, +1, +1, — 1, +1, 

-1, +1, -1, 4-1, +1, and S = 5. Hence 

5 - i 

y/ 6(6 - 1 )(2 X 6 4- 5)/18 3-^3 

Here the association between the paired ranks is clearly not significant. 

Because with problems involving ranks tie^ are very common, it is 
useful to know' the variance of the sampling distribution of N w r hen ties 
occur. Jf ties occur in one set of ranks, and not in the other, the variance 
of S becomes 

<r. 2 = t \[N(N - 1)(2AT + 5) - l t(t - 1)(2 / + 5)] i 4 -7 

One set of ranks contains m sets of / ties. Note that the effect of ties is to 
reduce the* variance a s 2 Examination of the above formula shows that 
the effect of ties is to reduce the variance by unity for each tied pair, by 
3.07 for each triplet of ties, and bv 8.07 for each quadruplet of ties. 
Thus we have available a \ory convenient correction procedure. 

If ties occur in both sets of ranks, the \ ariance of *S is given by 

m 

= !>, [ N(N - l)(2 N + S) - £ l(t - \)(2t + 5) 

- £ u(u - 1)(2« + ■'*)] + )(fr~2) 

t(l - l)(f - 2) J u(u - 1 )(u - 2)] 

+ 2.V(^--I)[I^- 1) ][t w(w - 1) ] ^ 

In this formula one set of ranks contains m sets of t ties: the other set of 
ranks, r set of u ties. This formula is the more general form for the vari- 


4 r> 6 

.'» 1 6 

- 1 , + 1 , — 1 , + 1 , + 1 , 
4 



224 


Rank correlation methods 


chap. 14 


ance of the distribution of *S. As in the untied case, to apply a test of 
significance, we divide S, corrected for continuity, that is, |S| — 1, by 
the standard deviation of the sampling distribution <r, to obtain the 
normal deviate z. 

14.8 

Rank correlation when one variable is a dichotomy 

The statistic S f and the rank correlation coefficient tau, may be calcu- 
lated when one set of ranks is a dichotomy. Consider an example given 
by Kendall (1950) for 15 boys and girls ranked on an examination. 
Here sex is a dichotomous variable. Let X be ranks on the examination, 
and Y sex. 

X 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 

Y B B G B (1 G B B B G B (1 B G G 

Here the eight boys may be considered tied on also the seven girls. 
Adopting the average-rank procedure, a rank of 4.5 may be assigned to 
each of the eight boys and a rank of 12 to each of the seven girls. The 
rank 4.5 is the average of the ranks 1 to 8, and the rank 12 is the average 
of the ranks 9 to 15. Thus we may write * 

X 1 2 3 4 5 6 7 8 0 10 11 12 18 14 15 

Y 4£ 4£ 12 \\ 12 12 4 \ 4{ 12 4} 12 4} 12 12 

The statistic S calculated from these ranks is 

S =7+7-6+6-5-544+4+4-2 

+ 3—1+2 + 0—18 

The coefficient tau is obtained using formula (14.4), and is 

T L§ — 225 

VTil5(15 - l)jU 15(15 - 1) - 49] 

where T s = 0 and U y = £(8 X 7) +j(7 X 6). In this example the 
rank 4.5 has been assigned to boys and the rank 12 to girls, resulting in a 
positive correlation. This, of course, is arbitrary. The ranks could have 
been reversed and a negative correlation obtained. 

To test the significance of the association, the standard error ar t is 
calculated and a normal deviate z obtained. In the present illustrative 
example X contains no ties, Y contains two groups of ties, and a, may be 
obtained using the square root of formula (14.7), as follows: 

= VtM nT X 14 X 35 - 8 X 7 X 21 - 7 X 6 X 19] 

= 17.28 



sec. 14.9 


The coefficient of concordance W 


225 


Subtracting unity from the absolute value of S as a continuity correction 
and dividing by <r„ we obtain the normal deviate 

* = (|S| - 1 )/<r, = 17/17.28 = .984 

Clearly the association between ranks on the examination and sex is not 
significant. 

When one variable is a dichotomy and no ties occur in the X ranks, 
6 is reduced by unity as a continuity correction. When one variable is 
a dichotomy and the other contains m groupings of values of extent t\ 9 
the absolute value of 8 is reduced by (2 N - U - t m )/2(m - 1) where 
^ Here t\ and t m are the number of tied values in the first and last 

groups. The reader should note also that when ties occur in the X varia- 
ble, formula (14.8) and not formula (14.7) should be used in calculating <r t . 


14.9 

The coefficient of concordance W 

For data comprised of m sets of ranks, where m > 2, a descriptive meas- 
ure of the agreement or concordance between the m sets is provided by 
Kendall s coefficient of concordance W. The data of Table 14.3 consist of 
six ranks assigned by four judges. These data were obtained in an inves- 
tigation on interviewing technique. Four interviewers were required to 
interview six job applicants and rank order them on suitability for 
employment. If perfect agreement were observed between the four 
interviewers, one applicant would be assigned a 1 by all four. Tho sum 
of his ranks would b< 4. Another applicant would be assigned a 2 by all 
four interviewers. The sum of his ranks would be 8. The sum of ranks 
for the six applicants would be 4, 8, 12, 16, 20, although not necessarily in 


Table 14.3 

Ranks assigned to six job applicants by four interviewers 


Applic 

Interviewer — 

a 

ant 





b 

c 

d 

e 

f 

A 

6 

4 

1 

2 

3 

5 

B 

5 

3 

1 

2 

4 

6 

C 

6 

4 

2 

1 

3 

5 

D 

3 

1 

4 

5 

2 

6 

R, 

20 

12 

8 

10 

12 

22 



226 


Rank correlation methods 


chap. 14 


that order. In general, when perfect agreement exists among ranks 
assigned by m judges to N members, the rank sums are m, 2m, 3m, 
4m, . . . ,Nm. The total sum of N ranks for m judges is mN(N + l)/2, 
and the mean rank sum is m(N + l)/2. 

The degree of agreement between judges reflects itself in the varia- 
tion in the rank sums. When all judges agree, this variation is a maxi- 
mum. Disagreement between judges reflects itself in a reduction in the 
variation of rank sums. For maximum disagreement the rank sums will 
tend to be more or less equal. This circumstance provides the basis for 
the definition of a coefficient of concordance. 

Let li } represent the rank sum of the 7th individual. The sum of 
squares of rank sums for N individuals is 



The maximum value of this sum of squares occurs when perfect agreement 
exists between judges and is equal to 

m 2 ( N* - N) 

12 

The coefficient of concordance W is defined as the ratio oi to thy maxi- 
mum possible value of S and is 

H m*(N 1 ' - N) 14-10 

When perfect agreement exists between judges, W — 1. W r hen maxi- 
mum disagreement exists, W = 0. W does not take negative values. 
With more than two judge* complete disagreement cannot occur. For 
example, if A and B are in complete disagreement and A and C are also 
in complete disagreement, then B and C must be in complete agreement. 

In the example of Table 14.3 the rank totals are 20, 12, 8, 10, 12, and 
22. The sum of ranks is 84. The mean rank total, the rank sum 
expected in the case of independence, is V = 14. The sum of squares of 
deviations about this mean is 


S = (20 - 14) 2 + (12 - 14) 2 + (8 - 14) 2 

+ (10 - 14) 2 + (12 - 14) 2 + (22 - 14) 2 = 160 

In our example m = 4 and N = 6 and the coefficient of concordance is 

12 X 160 


W = 


.571 


4 2 (6 3 - 6) 

The concordance among m sets of ranks may be described by cal- 
culating Spearman rank-order correlation coefficients between all pos- 



sec. 14. 1 1 


Significance of the coefficient of concordance W 


227 


sible pairs of ranks and finding the average value, denoted by p. This 
average is related to W. The relation is given by 

_ mW — 1 

p " ~m~l 1411 

l«’or the particular case where m = 2 the relation is p 2 IF — 1. For 
IF — 0, p = —1, for IF =• .5, /» — 0, and for IF = 1 , p = 1 


14.10 

The coefficient of concordance with tied ranks 


Where tied ranks occur, proceed as before and assign to each member the 
average rank which the tied observations occupy. If the ties are not 
numerous, we may compute IF directly from the data without further 
adjustment. If the ties are numerous, a correction factor is calculated 
for each set of ranks This correction factor is 


T = 


Z(t 3 ~ 0 

12 


14.12 


For example, if the ranks on X are 1, 2.5, 2.r>, 4, . r >, <>, 8, 8, 8, 10, we have 
two groups of ties, one of two ranks and one of three ranks. The cor- 
rection factor for this set of ranks for X is 

_ (2 3 - 2) + (3 3 - 3) _ , 

Jz ~ ~~ 12 ' ~ 

A correction factor T is calculated for each of the m sets of ranks, and 
these are added together over the m sets to obtain 27\ We then apply a 
formula for W in which this correction tactor is incorporated. The 
formula is 

_ _ 8 

r ^w 2 (AT 3 — iV) — mZT 4 3 

The application of this correction tends to increase the size of W . The 

correction has a small effect unless ties are quite numerous. 


14.II 

Significance of the coefficient of concordance W 

For AT of 7 or less, values of W lequired for significance at the f> and 1 per 
cent levels have been tabulated by F riedman (1940) and are reproduced in 
Kendall y 1 955) and Siegel (1950). A useful adaptation of these tables is 
given by Edwards (1954). Critical values of W depend both on m, the 
number of sets of ranks, and on N , the number of ranks in each set. lor 



228 


Rank correlation methods 


chap. X4 


N greater than 7, a x 2 test may be applied. Calculate the quantity 

X 2 = m(N — 1 )W 14-14 

This has a chi-square distribution with N — 1 degrees of freedom. For 
the data of Table 14.3, S = 160, W = .571, m = 4, and N = 6. Refer- 
ence to Edwards’s table provides critical values of .505 and .621 for sig- 
nificance at the 5 and 1 per cent levels. If we apply the chi-square test 
to the same data, we obtain 

X 2 = 4(6 - 1).571 = 11.42 

For df = 6—1 = 5 the values of x 2 required for significance are 11.07 
and 15.09 at the 5 and 1 per cent levels, and as before we are led to the 
conclusion of significant association at the 5 per cent level. Of course, in 
this case the tabled values are to be preferred because N is less than 7. 
For N less than 7 the chi-square test will provide a very rough estimate of 
the required probabilities. Other procedures for testing the significance 
of W exist. For a more thorough discussion of this problem see Edwards 
(1954). 

14.12 

The coefficient of consistence K 

To obtain a ranking of objects on an attribute, the objects may be pre- 
sented two at a time in all possible pairs and a judge required to make a 
choice on the presentation of each pair. Thus a choice is made between 
every object and every other object. This procedure is known as the 
method of paired comparisons and has been widely used in psychological 
work. The method is usually assumed to yield a more reliable ordering 
than that obtained by requiring a judge to order a whole group of objects 
directly. The number of possible pairs is the number of combinations of 
N things taken two at a time, or N(N — 1 )/2. As N increases, the num- 
ber of comparisons increases very rapidly; consequently for large N the 
method is frequently impractical. 

In the method of paired comparisons we may wish to ascertain the 
consistency of the choices made. Let A , B } and (7 be three objects. If 
A is preferred to B and B is preferred to C, consistency of judgment would 
require that A be preferred to C. If C is preferred to A this latter choice 
is clearly inconsistent with the two previous choices. What meaning 
attaches to the presence of inconsistent choices? Let A, B , and C be 
red, blue, and yellow cards, each of a different saturation. A judge may 
prefer red to blue, blue to yellow, and then may indicate a preference of 
yellow to red. This inconsistent choice may result because the judge 



sec. 14.12 


The coefficient of consistence K 


22Q 


may be unable to discriminate and may indicate preferences in a more or 
less haphazard fashion. Many inconsistent choices in the method of 
paired comparisons result because the task requires a refinement of dis- 
crimination which is beyond the capacity of the judge. Inconsistent 
responses may also arise because the dimension of judgment has changed. 
The red card may be preferred to the blue and the blue to the yellow on 
the basis of hue. The yellow may be preferred to the red the basis of 
saturation. A different dimension is used as a basis of choice and leads 
to the presence of an inconsistency. To illustrate further, an orange 
may be preferred to a peach because of its color, a peach may be pre- 
ferred to a pear because of its flavor, a pear may be preferred to an orange 
because of its shape, and thus an inconsistency arises. Where incon- 
sistencies are numerous, a question may attach to the meaning of the rank 
ordering of objects obtained. It is convenient to represent a choice A in 
preference to B by the notation A - ■* B and a choice of B to A by B — > A . 
The sequence A — ► B — > C .1 is an inconsistent triplet, or triad, of 
choices. For any set of paired comparisons between N objects the num- 
ber of inconsistent triads may be counted and used to define a coefficient 
of consistency of response. 

Responses obtained by the method of paired comparisons may be 
represented in tabular fashion in the form of a response pattern as shown 
in Table 14.4. This table shows paired comparisons between nine objects, 
A, B, C, ... ,H, L A is preferred to B, and a 1 is entered in the cell 
corresponding to row’ A and col. B above 1 the main diagonal. A com- 
plementary 0 is entered in col. .1 and row B below the main diagonal. 
All other choices may be similarly represented. We note that where no 
response inconsistencies are present, all entries on one side ot the main 
diagonal are l's and all entries on the other side 0\s. In this table the 
presence of some 0\s above the main diagonal and the complementary l's 
below it indicate the presence of inconsistencies. Let us now sum the 
rows of Table 14.4. If no inconsistencies were present, the row sums 
would be the numbers 8, 7, 6, o, 4, 3, 2, 1, 0. Because of the presen-e of 
inconsistencies, the actual obtained numbers arc 7, G, o, 5, 4, 3. 3, 2, 1, 
although not in that order. The effci t of inconsistencies is to reduce the 
variability of the numbers ohu ».od by adding up the rows of the response 
pattern. Denote a row sum by R. The mean of the row sums is 
R = 2 R/N, which may be shown equal to ( N — l)/2. The sum of 
squares of row sums is: 

N(N - l) 2 

2 (R - RY - 2fl 2 - 4 ~ ~ I 4 ,I 5 

It is appropriate to consider the maximum and minimum values of this 
sum of squares. The maximum value of 2 {/? — RY occurs when no 



230 


Rank correlation methods 


chap. 14 


inconsistent ies are present in the response pattern and is equal to N(N 2 — 
1) 12 The minimum value of 2(7? — R ) l depends on whether N is odd 
01 even If N is odd the minimum value of 1(R — R ) 2 is 0 If AT is 
it en, it may be shown that the minimum value of 2(/£ — R ) 2 is not 0, 
but is N / 4 (Kendall, 1941) We then define a coefficient of consistence of 
response K as follows 

. observed sum of squaies — minimum sum of squares 
A - r — — C X 4 

1111 x 111 mm sum of squares — minimum sum of squares 


Simple substitution shows that if N is odd 

_ 122 ( 7 ? ~_Ry 

N(N 2 - l) 

and if N is eun 

12 Z(R - Ry - iN 

N(N* - 4) 


14 17 


14 18 


Table 14 4 

Response pattern for paired comparisons between nine objects 
and calculation of coefficient of consistence 



4 

1 

* ! 

1 

< 

i ) 

E I 

1 

f 

G 1 

H 1 

I 

1 

Row urn 
R 

at «)* 

1 


1 

0 

0 

1 

1 

1 

0 ^ 

1 

r ) 

1 

H 

0 ! 


1 

1 

1 

1 | 

1 1 

0 

1 

1 

b 

4 

( 

1 

0 


0 

0 

1 

1 

1 

1 

5 

1 

I ) 

1 

0 

1 


1 

1 

1 

1 

1 

7 

9 

b 

0 

0 

1 

0 

- 

1 

1 

1 

0 

4 

0 

l 

0 

0 

0 

0 

0 


1 

1 

1 

1 


1 

G 

0 

l 

0 

0 

0 

0 


1 

1 

3 

1 

H\ 

1 

0 

0 

0 

0 

0 

0 

- 

0 

1 

9 

I 

0 

0 

1 

0 

0 

1 

0 

0 

1*' 


2 

4 


H X{H /?)* « 30 

K _ 12 2(tf - Ry 12 X 30 
N(N* - 1) 9(0* - 1) 



sec. 14*13 


The significance of the coefficient of consistence 


231 


This is Kendall/s coefficient of consistence It has an expected value of 0 
when responses are assigned at landom, the (ase of maximal incon- 
sistency, and 1 when no inconsistencies aie present 

The calculation of K is illustrated in Table 14 4 In this example A r 
is odd, — R) 1 — dO, and K — lOO 

How may the coefficient A be interpreted } The number of incon- 
sistent tnads of the kind 1 ♦ B > C >1 may he denoted by c/, which is 

rc lated to the c oeffieicnt K It may he show n that w hen N is odd 


d = N(N i - 1)H - A") 


14 19 


and when N is even 

d _ N(N> - 4)(1 - A) 


14 20 


In the example of Table 1 1 l, the number of incoiisist »u ’is d is found to 
be 1") The maximum possible number of inconsistencies is dO Thus 
one-hall the triadic relations aie *nc (insistent 1 ho other half consistent 
and A ")(J V K of 20 would mean that foui-lifths of the relations 
wcif inconsistent and one-fifth, consistent 


1413 

The significance of the coefficient of consistence 

The significance of the* coefficient of consistence ma\ be approached by 
c c nsuleniig the distribution of the numbei of triadic relations v here 
choices ate made at i<» ndi m Kendall (l l *.V>) provides a table of proba- 
bilit u*s that pat tic ulai \ allies of d will he attained or exceeded for N - 2 
to 7 hot N > 7, Kendall ha* shown that a x° te-t tnav he used .vhich 
provides approximate piobabi \ties The quantity 



has an appioximate x ? distiibution wit l degrees of freedom given by 


__ N(N - 1 V - 2) 
^ ~ (N - 4) 2 


14 22 


The term C* N in the expression fo*- x* * s the number of combinations of N 
things taken thiee at a time, 01 A M d*(V — 3) f In using this test the 
required \ * liability that a value of d equal to or greater than that 
obtained will result where choices are allotted at random is the complement 
of the probability foi x 2 * 



232 


Rank correlation methods 


chap. 14 


For the data of Table 14.4, N = 9 and d = 15. We have 

_ 9 X 8 X 7 _ on 1C 
df = "(9^Tp" ~ 2016 

*’ - g4-J (j x 3®i - 15 + i) + 20 10 - 28 96 

The probability associated with this value of x 2 is greater than .99. This 
means that the significance level for d is less than .01, the complement of 
.99. We conclude that the consistency represented in the data is greater 
than we could reasonably expect on the assignment of choices at random. 
The coefficient of consistence K = .50 may be said to be significantly 
different from zero at better than the .01, or 1 per cent, level. 

EXERCISES 

1 Calculate Spearman's rank correlation coefficient for the following 
paired ranks: 

XI 2 3 4 5 0 7 8 

Y 2 4 5 1 6 3 8 7 

Does the coefficient obtained differ significantly from #ero? 

2 Convert the following measurements to ranks: 

X 4 4 7 7 7 9 16 17 21 25 

Y 8 16 8 8 16 20 12 15 25 20 

Calculate Spearman's rank correlation coefficient. Does the coefficient 
obtained differ significantly from zero? 

3 Are the following values of p significantly different from zero? (a) 
P = .30 for N = 25, ( b ) p = .60 for N - 15, (c) p = .70 for N = 10. 

4 Calculate the statistic S for the following sets of paired ranks: 


a X 

1 

2 

3 

4 

5 

6 

Y 

6 

4 

2 

1 

3 

5 

b X 

1 

2 

3 

4 

5 

6 

Y 

6 

3.5 

1.5 

1.5 

3.5 

5 

c X 

2 

2 

2 

3 

4 

5 

Y 

6 

3.5 

1.5 

1.5 

3.5 

5 

d X 

2 

2 

2 

5 

5 

5 

Y 

6 

4 

2 

1 

3 

5 

e X 

2 

2 

2 

5 

5 

5 

Y 

6 

3.5 

1.5 

1.5 

3.5 

5 

f X 

2 

2 

2 

5 

5 

5 

Y 

5 

5 

2 

2 

2 

5 



Exercises for chapter 14 


233 


5 Calculate the sampling variances for the statistic S obtained for the 
sets of paired ranks in Exercise 4 above. In each case obtain the nor- 
mal deviate z with a continuity correction. 


6 Three judges rank order a group of seven students on an examination 
as follows: 

Judge Student 


A 

B 

r 


f 

6 

7 

6 


Compute the Spearman rank coefficients between judges and the coef- 
ficient of concordance. 


7 A supervisor ranks six employees A, B, C, D , E } and F on job per- 
formance ufring the method of paired comparison^. The data are as 
follows: A -+ B, A-+C, A -+ I), E -> A, F-> A, B-+C, D-+B, 
B -+ E, B —> F, C-+ D y C —* E, F-*C, D — > E, D->F, E — * F 
Calculate the coefficient of consistence for these data. How may this 
coefficient be interpreted? 

8 Calculate a coefficient of consistence for the table composed of the first 
five rows and columns of Table 14.4. 



Other "Varieties 
of Correlation 



.1 Introduction 


We have hitherto considered produet-moment correlation for use with 
continuous variables of the interval and ratio type. We have considered 
also rank-order correlation methods for use with ordinal data. Many 
other varieties of correlation have been developed. These have applica- 
tion to particular types of problems. In many instances, although not 
all, these are particular cases of the more general product-moment cor- 
relation and are derived on the ba^is of particular conditions or assump- 
tions. In this chapter we shall discuss the contingency coefficient , the phi 
coefficient or fourfold point correlation , point biserial and hi serial correla- 
tion, tetrachnric correlation , and the correlation ratios. The contingency 
coefficient is a descriptive measure of the association between nominal 
variables. The phi coefficient is applicable to 2 X 2 table* when the 
dichotomous variables are assumed to be discrete. Point biserial and 
biserial correlations art' applicable to tables comprised of 2 columns and 
R rows, R > 2. Point biserial correlation assumes that the two-rate- 
go ried variable is dis< rote. Biserial correlation assumes that the two- 
categoried variable is in fact continuous and normally distributed. 
Tetraehorie correlation is a foim of correlation for use with 2X2 tables, 
which in many instances may be reductions of larger tables. It assumes 
that both underlying variables are normally distributed. The correla- 
tion ratios are applicable when the regression lines are nonlinear. 


15-2 

The contingency coefficient 

The contingency coefficient is a nominal statistic. It is a descriptive 
measure of association between nominal variables. It may be calculated 
on tables comprised of any number of rows and columns, greater, of 
course, than 1. As a nominal statistic it is independent of the ordering of 
the rows and columns of the contingency table. The arrangement of the 



sec. 15.2 


The contingency coefficient 


235 


rows and columns may be changed, and the numerical value of the coef- 
ficient remains unaltered. The formula for the contingency coefficient is 
usually stated in terms of x 2 * As before [Eq. (13.1)], we define x 2 as 


x 


2 


V (0 - E)' 
L ' " ~E 


where 0 is the observed and E the expected cell frequencies. The contin- 
gency coefficient is then given by 

C = Va'"T X 1 15,1 

where A r is the total number of observations. 

In Table 13.7 of Chap. 13, a 3 X 3 contingency table is presented 
showing the relationship between eye and hand laterality. For this table 
N = 413 and x 2 , a* calculated in Table 13.8, is 4.02. The contingency 
coefficient for those data is 


'' - '+“4.02 * 098 

A value of C =- .098 indicates almost a complete absence of association 
between eye and hand laterality. 

The minimum value of C is zero. C is zero when the two variables 
are independent. C cannot take negative values. The concepts of posi- 
tive and negative imply direction based on an ordering of categories or 
classes. For a strictly nominal variable, the concept of order is without 
meaning. In many practical instances, where contingency coefficients 
are used, an order is observed in the data. If, for example, left-handed- 
ness were associated with left -eyed ness, this might be considered a posi- 
tive association. Jf, however, left-handedness were associated with 
right-eyedness, this might be con -ideml a negative association. Some 
investigators may choose to attach a positive or negative sign to a con- 
tingency coefficient to indicate direction when this has meaning in relation 
to the data. 

The maximum value of the contingency coefficient depends on the 
number of categories of the variables. For square contingency tables 
the number of is equal to the number of columns and the maximum 
value of C is given by 

* - /i 1 

where k is the number of arrays, either rows or columns. Thus for a 2 X 2 
table the maximum upper limit of C is \^\ — .707; for a 3 X 3 table *he 



236 


Other varieties of correlation 


chap. 15 


maximum value is \/| = *816. Maximum values for k = 2 to k = 10 
are as follows: 


Number of 
categories for 

Maximum 

both variables 

C 

2 

.707 

3 

.816 

4 

.866 

5 

894 

6 

913 

7 

.926 

8 

.935 

9 

.943 

10 

.949 


We observe that as the number of categories increases, C approaches 1 
as a limit. The dependence of C on the number of categories raises difficul- 
ties of interpretation. It means that different values of C arf not directly 
comparable unless based on tables having the same number of rows and 
columns. Thus a contingency coefficient based on a 2 X 2 table may be 
compared directly with one based on another 2X2 table. It is not 
directly comparable, however, with one based on a 3 X 3 or 3 X 4 table. 

The sampling distribution of the contingency coefficient is a matter 
of some complexity. To test the significance of an obtained value of C a 
knowledge of its sampling distribution is unnecessary. To compute C we 
require x 2 - We may test the significance of C by consulting a table to 
ascertain whether or not the x 2 is significant. 

In computing C, considerations pertaining to small cell frequencies in 
relation to x 2 , as described in 13.6, apply. 

15-3 

The phi coefficient 

The phi coefficient, or fourfold point correlation, is applicable to 2 X 2 
tables only. It is related to x 2 - The two dichotomous variables are 
assumed to be discrete, and the two categories of each to be amenable to 
appropriate representation by two point values. In practice it is widely 
used when the two variables are obviously not discontinuous. 

One formula for calculating the phi coefficient, or is 

BC - AD 



sec. 15.3 


The phi coefficient 


237 


where A , 7i, C, and D are the four cell frequencies. The term in the 
denominator of the above expression is the square root of the product of 
the four marginal totals. 

Tabic 15.1 shows a 2 X 2 table illustrating the relationship between 
two psychological test items. The value of </> based on this table is .376. 
The reader will note that in this example the two underlying variables 
may be regarded as continuous. The categories “pass” and “fail” may 
be considered a dichotomy of an underlying continuous ability variable. 
Individuals above a certain threshold value on the ability variable pass 
the item; those below it fail the item. 

The phi coefficient is related to x 2 calculated on a 2 X 2 table by the 
expression 



or 

X 2 = N<f> 2 15-4 

Any formula for calculating x 2 for a 2 X 2 table may with minor modifica- 
tion be used for calculating <£. 

Alternative formulas for computing may be stated. In psycho- 
logical-test statistics it is conventional to represent the proportion passing 
item ? by p t and those failing by q x , where p t = 1 — q x . Similarly, the 
proportion passing item j is p 3 and the proportion failing q 3 . The propor- 


Table 15 .x 

Computation of phi coefficient of con elation 
between two test items 


Frequency Proportion 

Item 2 Item 2 

Fail Pass Fail Pass 



(?*) (P») 


19 X 15 - 11 X 5 
\/30 X 20 >r 24“X26 




238 


Other varieties of correlation 


chap. 15 


tion passing both items i and j is represented by p XJ . The 0 coefficient of 
correlation between two test item* may then be written as 


0 = p J2-Z_ PlP J 

VpiP&qj 

For the example of Table 15.1, p u = .38 and the phi coefficient is 

; .38 - .00 X .48 

0 = , - - — = .370 

\/.()0 X .. r > 2 X .40 X .48 


iSS 


which checks with the result previously obtained. When one of the 
variables is evenly divided, p, - q t - .50, the formula for 0 simplifies to 


2 p„ - p, 

VpjQj 


156 


When both variables are evenly divided and p t - 7 , - p } ~ q } — .50, 
the formula becomes 


0 - 4 p t . - 1 15 7 

The phi coefficient has been widely used in statistical work associated with 
psychological tests. Usually when investigators speak of the correlation 
between dichotomously scored test items, the reference is to the phi 
coefficient. 

The phi coefficient is a particular case of the product-moment corre- 
lation coefficient. If we assign integers, say, 1 and 0 , to represent the 
two categories of each variable and calculate the product-moment cor- 
relation coefficient in the usual way, the result will be identical with 0 . 

The phi coefficient has a minimum value* of — 1 in the ease of perfect 
negative and a maximum value of +1 in the case of perfect positive 
association. These limits, however, can be attained only when the two 
variables are evenly divided; that is, p, = q x — p } — 7 , =■ .50. When 
the variables are the same shape, p l = p } and q x — q )y but are asym- 
metrical, p t ^ q t and p } 5 * q h one or the other of the limits —1 or +1 
may be attained but not both. The maximum and minimum values of 
phi are clearly influenced by the marginal totals. Consider the following 
2 X 2 tables: 

1234 



- 

+ 



- 

f 



- 

4 - 



— 

4 

4 

0 

50 

50 

4 

50 

0 

50 

f 

20 

60 

80 

4 

40 

40 

- 

50 

0 

50 

- 

0 

50 

50 

- 

20 

0 

20 

- 

0 

20 


50 50 


50 50 


40 60 


40 60 







sec. 15.4 


Point biserial correlation 


239 


in tables 1 and 2 both variables are evenly divided and coefficients of +1 
and —1 are possible. Table 3 represents the maximum positive associa- 
tion possible, given the restriction of the marginal totals. The phi coef- 
ficient is .013. Table 4 shows the most extreme negative association pos- 
sible with the same marginal totals. The phi coefficient is —.403. For 
this particular set of marginal totals phi can extend from a minimum of 
— .403 to a maximum of .013. 

While the influence of the marginal totals on the range of values of 
phi may in some of its applications prove to be a disadvantage, this effect 
is in no way inconsistent with correlation theory. If a correlation coef- 
ficient is viewed as a measure of the efficacy of prediction, then perfect 
prediction in both a positive and a negative direction is possible only when 
the two distributions have the same shape and are symmetrical. If one 
variable is normally distributed and the other is rectangular, perfect pre- 
diction of the one from the other is not possible and the correlation coef- 
ficient reflects this fact. Perfect prediction in one direction requires 
only identity of shape; perfect prediction in both directions requires 
symmetry also. The phi coefficient, although affected by the marginal 
totals, is a measure of the efficacy of prediction. From this viewpoint it 
quite rightly reflects the loss m degree of prediction resulting from the 
lack of concordance of the two marginal distributions. 

Hecause * 2 = N<f) 2 t we can readily test the significance of <f> bv refer- 
ring N<f> 2 to a chi-square table with 1 degree of freedom. When df = 1, 
\/ x 2 i s a normal deviate and we may refer <j> \AV to table's of the normal 
curve. In sampling from a population where no association exists, the 
distribution of <f> should ho approximately normal with a standard error 
of 1/ \/N. Of course, all considerations pertaining to small frequencies 
(Sec. 13.0) apply here. A' should, clearly, not be too small. 


15-4 

Point biserial correlation 

Point bisenal correlation provides a measure of relationship between a 
continuous variable and a two-cat egoried, or dichotomous, variable. 
The data when arranged in a frequency distribution take the form of a 
table comprised of R rows and 2 columns. The dichotomous variable is 
assumed to be discrete. For example, the continuous variable may be 
scores on a psychological test and the dichotomous variable may be male 
or female, or high school graduates and university graduates, or owning a 
television set and not owning a television set. Point biserial correlation 
is frequently applied in practice where the underlying dichotomous varia- 
ble is not discrete. For example, “pass’’ or “fail” on a psychological- 



240 


Other varieties of correlation 


chap. 15 


test item may be interpreted to be a dichotomy of an underlying con- 
tinuous ability variable. “Normal” versus “neurotic” may be considered 
a somewhat arbitrary division of a continuous neuroticism dimension. 
Success or failure in an occupation may be viewed as a dichotomy of a 
continuous variable extending from exalted achievement to abysmal 
defeat. 

Point biserial correlation is a product-moment correlation and is a 
particular case of the formula r = E(X — X)(Y — Y)/(N — l)s*v If we 
assign a 1 to individuals in one category and a 0 to individuals in the other 
and calculate the product-moment correlation, the result is a point biserial 
coefficient. Weights other than 1 and 0 may be assigned to the cate- 
gories. The coefficient is in no way dependent on the weights assigned. 

The formula for point biserial r is 

X p -X q r - 

rpbx = — Vpq 15-8 

where s t = standard deviation of all scores on continu- 
ous variable, defined as \^X(X — X) 2 /N 
p and q = proportions of individuals in Vvo catego- 
ries of dichotomous variable 

X p and X q = mean scores on continuous variable of 
individuals within the two categories 
Thus if the continuous variable is a set of error scores on a maze test 
designed to provide a measure of animal “intelligence,” and the two 
categories of the dichotomous variable are samples of “dull” and “bright” 
strains of rats, then X p is the mean error score on the maze test of the dull 
and X v is the mean error score of the bright rats. In this example a high 
error score means low intelligence. The direction of the correlation must 
be determined by inspection of the data. 

To illustrate the calculation of point biserial correlation from 
ungrouped data consider Table 15.2. This table presents scores on an 
“anxiety” inventory for a group of 14 individuals, 8 of whom are described 
as “normal” and 6 as “neurotic.” The higher the score on the inventory, 
the greater the anxiety. The mean inventory score X p for the six neu- 
rotics is 38.15, and the mean X q for the eight normals is 23.88. A com- 
parison of these means suggests that the test discriminate^ between the 
two groups. The standard deviation of inventory scores is 18.19. The 
proportions p and q of neurotics and normals are .43 and .57. The point 
biserial correlation is .39. 

In this example the point biserial correlation coefficient is a measure 
of the capacity of the anxiety inventory to discriminate between the two 
clinical groups. This statistic can always be interpreted as a measure 



sec. 15.4 


Point biserial correlation 


241 


of the degree to which the continuous variable differentiates, or dis- 
criminates, between the two categories of the dichotomous variable. The 
reader will note in Table 13.2 that if the eight individuals making the low- 
est inventory scores were normals and the six making the highest scores 
were neurotics, the point biserial would be a maximum for these data. 
Also, if the labels “normal” and “neurotic” were arranged more or less at 
random in relation to score, the difference between X p and X q% and also 
Tpbi } would tend to zero. 

A 11 alternative method of calculating point biserial correlation is 
given by 



where X t is the mean of all scores on the continuous variable. As in 
(15.8) the standard deviation is defined as s t = V2(A — X) 2 /N. 


Table 15.2 

Calculation of point biserial correlation from ungrouped data 


Individual 

Inventory 

score 

Clinical 

description 

1 

6 

Normal 

2 

8 

Neurotic 

3 

8 

Normal 

4 

11 

Normal 

5 

16 

Neurotic 

(> 

25 

Normal 

7 

27 

Normal 

8 

31 

Normal 

9 

31 

Neurotic 

10 

39 

Noimal 

11 

44 

Normal 

12 

50 

Neurotic 

13 

56 

Neurotic 

14 

68 

Neurotic 


Mean score for neurotics: X p — 38.15 
Mean score for normals: = 23.88 

St = 18.19 p = Yf =* .43 <7 — o = *57 



242 


Other varieties of correlation 


chap. 15 


Point biserial correlation is not independent of the proportions in 
the two categories. When p = q = .f>0, its maximum and minimum 
values will differ from those when, say, p = .20 and q = .80. The maxi- 
mum value of rpk, never readies + 1 ; the minimum value never reaches 
— 1. In predicting a two-eategoried variable from a continuous variable, 
perfect prediction is possible and occurs when the two frequency dis- 
tributions do not overlap. Perfect prediction of a continuous variable 
from a two-cat ego riod variable is obviously impossible. Some error in 
prediction must always occur in predicting a variable which may take a 
wide range of values from a variable which may take two values only. 
The point biserial correlation coefficient reflects this fact. It is worth 
noting here that the regression line obtained by calculating the means of 
the two columns is of necessity linear, there being only two points. The 
regression line obtained by calculating the means of the rows cannot be 
linear except under certain special circumstances. 

To test the significance of r pbl from zero the situation may be treated 
as one requiring a comparison of the two means X p and X q . The appro- 
priate value of l may be written 



The number of degrees of freedom is N — 2. This is a two-tailed test. 
For large N the quantity l/\/ N may be used as the standard error of 
t pbi in testing the significance of the difference from zero. 


IS- 5 

Biserial correlation 

Biserial correlation is a measure of the relationship between a continuous 
and a dichotomous variable, if being assumed that the variable underlying 
the dichotomy is continuous and normal. If a bivariate table comprised 
of R rows and C column^ is dichotomized and reduced to a table of R rows 
and 2 columns, biserial correlation will be a more accurate estimate of the 
correlation based on the 1 R X C table than point biserial correlation. 
One of its applications is in the selection of items for psychological tests. 
The biserial correlation of an item witli total test scot 1 is frequently 
used as a measure of the discriminatory power of the item. 

The formula for calculating this coefficient is 

X v — X q pq 

Th* = — 


St 


I5-II 



sec. 15.5 


Biserial correlation 


243 


where X p and X q = mean scores on continuous variable 
of individuals in two categories 
p and q = proportions in two categories 

s t = standard deviation of all scores, 
defined as Vl(X “jfjyAT 
y = height of ordinate of unit normal 
curve at point of division between 
p and q proportions of cases 

Thus if p - .30 and q = .70, by consulting the table of areas and ordi- 
nates of the normal curve, Table A of the Appendix, we can ascertain that 
the height of the ordinate y at the point of dichotomy is .348. 

For the data of Table 15.2 we may, for illustrative purposes, assume 
that the nonnal-versus-neurotic dichotomy is a division of a normally 
distributed continuous variable. This assumption may or may not be 
warranted in fact. For these data p = .43 and q - .57. The height 
of the ordinate of the unit normal curve at the point of dichotomy is 
y = .393, X p = 38.15, X q = 23.99, s, = 18.19, and 

_ 38.15 -J23.88 43X .57 _ r9 

rb ' 18.19 “ X '393 ' 


An alternative formula for biserial correlation is 


Tin 


X P ~ X t p 
st y 


15.12 


where X t is the mean score for the total sample. Applying this formula to 
the data of Table 15.2, we obtain r bt = .834. 

Theoretically, the maximum and minimum values of r bx are inde- 
pendent of the point of dichotomy and are — 1 and +T. An implicit 
assumption underlying this statistic is that the continuous many-valued 
variable is normal, as well as the variable underlying the dichotomy. 
Values of r bl greater than unity can occur under gross departures from 
normality. 

Some difficulties surround tht sampling distribution of r bt . The 
standard error of r b , in sampling from a population where the correlation 
is zero is roughly 


Srbi 


1 jpq 

~yVN 


1513 


When N is large this formula may be used with reference to the normal 
curve to test the significance of r bx . It should, however, be used with 
caution, because the probabilities thereby obtained are somewhat inac- 
curate. The standard erfor tends to increase with the extremeness of the 



244 


Other varieties of correlation 


chap. 15 


dichotomies. The reader may wish to compare the standard error of 
r bt with the corresponding large-sample formula for the ordinary product- 
moment correlation s r = 1 /\/N. The standard error of r bt is always 
larger than the standard error of the ordinary product-moment correla- 
tion. Where p = q = .5, the standard error of r bl is 1.25 times as large 
as the standard error of r. Where p = .90 and q = .10, the standard 
error of r bt is 1.71 times that of r. For further discussion of the sampling 
distribution of r 6l , sec Walker and Lev (1953). 

The relation between biserial and point biserial correlation is given 
by the expression 


y/ pq 

r b x = *>1 - £2 15.14 

y 

The factor y/pqt\j varies from 1.25 where p = q - .5 to 3.73 where 
p — .99 and q = .01. Thus r b , is always greater than r pbl and the differ- 
ence increases with extremeness of the dichotomies. 

156 

Tetrachoric correlation 

Tetrachorie correlation is appropriate to data arranged in a 2 X 2, or 
fourfold, table. It assumes that both variables underlying the dichot- 
omies are normally distributed. It has been used to provide a con- 
venient measure of correlation when graduated measurements have been 
reduced to two categories. It is an estimate of product-moment correla- 
tion. The tetrachoric 4 correlation calculated on a 2 X 2 table should be 
roughly about the same as that calculated on the more highly graduated 
R X C table, when the two variables are approximately normal in form. 

Direct calculation of tetrachoric correlation coefficients is alge- 
braically complex and arithmetically laborious. Because of this, various 
approximate methods and computation procedures have been devised. 
A commonly used approximation is known as the cosine-pi formula, which 
may be written in the form 

/ 180° \ 

r, = COS I 7=r==r I 15.15 

Vi + y/BC/AD ) 

A, B, C, and D are the four cell frequencies. B and C are the high-high 
and low-low and A and 7) the high-low and low-high cell frequencies. 
The reader will recall that the cosine of an angle is the horizontal side of a 
right-angle triangle divided by the hypotenuse, the side opposite the right 



sec. 15.6 


Tetrachoric correlation 


245 


angle. The quantity in the parentheses of the above formula is an angle, 
and its cosine is an estimate of the tetrachoric correlation. When 
AD = 0, the case of perfect positive correlation, the quantity y/BC/ AD 
is indefinitely large and r t = cos 0°. A table of trigonometric functions 
shows that the cosine of a zero angle is +1.00. When BC = 0, the 
case of perfect negative correlation, the quantity y/BC /AD = 0 and 
r% = cos 180°. The cosine of a 180° angle is —1.00. When BC - AD y 
the case of independence, y/BC/~AD = 1 and r t =■ cos 90°. The cosine 
of a 90° angle is zero. If the angle is greater than 90°, the correlation is 
negative. 

Table 15.3 illustrates the application of formula 15.15. The amount of 
calculation is trivial. Tables have been prepared which enable the rapid 
determining of the cosine-pi approximation of r t from the ratio BC / AD ) 
this being the only calculation required. Such tables are reproduced in 
Guilford (1956) and Edwards (1954). 

The cosine-pi formula provides an excellent approximation to the 
tetrachoric correlation when the divisions of the two variables are equal, 
p = q = .5. As the divisions depart from equality this formula tends to 
overestimate the tetrachoric correlation. When the limits of the divi- 
sions are between .4 and .6, the discrepancy in estimation is quite small, 
its maximum value being about .02. For extreme divisions the discrep- 
ancy is substantial. For a method of estimating tetrachoric r where the 
divisions of the variables are not close to the medians, the reader is 


Table 15.3 

Calculation of tetrachoric correlation using 
cosine -pi approximation 


Occupation rating 
Below Above 
average average 

Above median 


Below median 


102 131 233 


40 

76 

wl) 

(B) 

62 

55 

(C) J 

(D) 


ft -* cos 


180° 


1 4 


/ 76 X 62 
\40 X 55 


* cos 73.08° 


- .291 




246 


Other varieties of correlation 


chap. 15 


referred to tables prepared by Jenkins (1955). See also note by Fishman 
(1956). 


A formula for the standard error of a tetrachorie correlation in 
sampling from a population where the population value is zero is given by 


_ 1 /pi<?lP2?2 

~ y\yi V N ‘ 


1516 


where y\ and ;/ 2 are the heights of the ordinates of the unit normal curve 
at the points of dichotomy, and p 1 , <j\ and p 2 , <72 are the proportions in the 
two categories for the two variables. While this formula may be used 
with reference to the unit normal curve to test the significance of an 
observed r tf the procedure is somewhat dubious because uncertainty 
attaches to the nature of the sampling distribution of r t . To test the 
significance of a correlation on a 2 X 2 table the investigator is on much 
safer ground using x 2 rather than concerning himself with use of the 
standard error of r t . This formula, however, permits a comparison with 
the corresponding large-sample standard-error formula for product- 
moment r, s r = 1 / \/N . The standard error of r t is always greater than 
the standard error of r. The magnitude of the error increases with 
increase in the extremeness of the dichotomies. When the population 
value of r t is zero and the two variables are evenly split, s r , is about 
1.57 times as large as s r . When p = .90 and q — .10 for both variables, 
s Tt is about 2.92 times as large as s r . These considerations suggest that 
the use of tetrachorie correlation is ill-advised when the dichotomies are 
extreme. 

Tctraehoric correlation has been used as a laborsaving device when 
large numbers of correlations are required. This procedure is quite 
acceptable when N is large. Also, under these conditions it is possible to 
dichotomize close to the medians of the two variables. In any situation, 
however, the reduction of a multicategoried to a two-categoried variable 
results in a loss of information, which in the case of r t reflects itself in the 
standard error. 


15-7 

The correlation ratios 

The correlation ratios are descriptive of the relations between variables 
when the regression lines are nonlinear. Although of some theoretical 
interest, they have been infrequently used by psychologists. A common 
example of a nonlinear relation occurs in the correlation of psychological- 
test performance and chronological age when a broad age range is cov- 



sec. 15.7 


The correlation ratios 


247 


ered. Performance usually shows a more rapid increase during the earlier 
years than the later. The discussion of the correlation ratios given here is 
rather cursory. 

The reader will recall from ('hap. 8 that the calculation of a product- 
moment correlation involves in effect the calculation of two regression 
lines. One line is used in predicting Y from A", and the other A from Y. 
A discrepancy between an observed value and a predicted value, a point 
on a regression line, is an error of estimate. If Y is an observed value and 
Y' is an estimate of it predicted from A”, then the difference 1' — Y' is an 
error of estimate 4 . The sum of squares of the errors of estimate in pre- 
dicting Y from X is 2(F — l*') 2 . "here Y ' is obtained from a linear 
regression line. In predicting A" from Y , the sum of squares of the errors 
of estimate is 2 (A — A') 2 . The correlation coefficient is related to these 
sums of squares by the formula 

2 , 2(r-P') 2 . 2 (A-A') 2 

r 2 = 1 — _ = 1 — . 15.17 

Z(r-n* 2(A - A) 2 

Consider nov the following highly artificial bivariate frequency 
table' : 


X 



1 

2 

3 

4 

5 

6 

t> 







5 







4 



10 

10 



3 


5 

“ 


5 


2 

1 





1 

1 1 

1 


1 




" 


1 5 10 10 


20 

10 


If two regression lines are fitted to the means of rows and columns in this 
table, these lines will be at right angles to each other, and the product- 
moment correlation will be zero. Clearly, from a prediction viewpoint, 
this correlation does not adequately describe the situation. In predicting 
Y from A, perfect prediction is possible. If A is 1, then Y is 2; if A is 2, 
then Y is 3; and so on. In predicting A from Y f however, prediction is 
far from perfect. If Y is 2, then A may be either 1 or 6; if 1 is 3, then A 




248 


Other varieties of correlation 


chap. 15 


may be either 2 or 5 ; if Y is 4 , then X may be either 3 or 4 . Thus the 
prediction of Y from X is perfect, whereas the prediction of X from } r is 
subject to gross error. This results from nonlinearity of regression, a cir- 
cumstance not unrelated to the shapes of the two marginal distributions. 

In situations of this kind the correlation ratios may be used to 
describe the relation between the variables. With product-moment 
correlation an error of estimate is a deviation from a straight regression 
line fitted to the means of rows or columns. With the correlation ratios 
an error of estimate is simply a deviation from the mean of a row or col- 
umn. No regression lines are used. If Yj is the mean of column j and 
Y tJ is the score of the 2th individual in column j, then the difference 
Y tJ — Yj is an error of estimate. The sum of squares of these errors of 
estimate is 22 (} r tJ — Yj) 2 . The corresponding sum of squares about the 
means of rows is 22 (.Y,, — X ,) 2 . The correlation ratio which is descrip- 
tive of the prediction of Y from X is defined as 


Vvz 2 


22(K„_- f,) 2 

2 (Y„ - Y ) 2 


15.18 


and in predicting X from Y 


V*v 2 = 1 “ 


22 (X„ - X ,) 2 
Z(X V - X ) 2 


1519 


In the case of perfect linearity of regression, a circumstance which does 
not arise in practice because of sampling error, rj yT 2 = rj xy 2 = r 2 . When 
the regression lines are nonlinear, the two correlation ratios will differ 
from each other and from the correlation coefficient. The correlation 
ratio in general is equal to or greater than the correlation coefficient. 
Thus 1 > Vvz 2 > r 2 . 

The discrepancy between rj vx 2 and r 2 is used as a measure of non- 
linearity of regression. The greater the difference, r ) yx 2 — r 2 , the greater 
the departures from linearity. To test the significance of the depar- 
tures of a regression line from linearity, we calculate the quantity 


w - r 2 )/(fc - 2) 
(1 - V 2 )/(N - k) 


15.20 


where k = number of arrays, either rows or columns 
N = total number of cases 

This ratio is an F ratio and may be referred to a table of F with k — 2 
degrees of freedom associated with the numerator and N — k degrees of 



Exercises for chapter 15 


249 


freedom associated with the denominator. Note that two such tests may 
he applied to any correlation table, one a tevt of the linearity of regression 
of X on Y and the other of Y on X. 

To test whether a correlation ratio is significantly different from zero, 
we may use the F ratio: 


v 2 / (k - 1 ) 

(1 - r\ 2 )/ (N - k) 


1521 


This F ratio has A* — 1 degrees of freedom associated with its numerator 
and N — k degrees of freedom associated with its denominator Quite 
obviously, one correlation ratio may differ significantly from zero and the 
other may not. 

Procedures for the practical computation of the correlation ratios 
are given in most statistics texts (see, for example, (luilford, 19o6). 

Although the correlation ratios are usually viewed as descriptive of 
the relations between variables when the regression lines are nonlinear, 
these ratios are in fact measures of the relation between a variable of the 
interval-ratio type and a nominal variable. The correlation ratios deal 
with the problem of nonlinearity by treating one of the variables as if it 
were a nominal variable, that is, by ignoring the question of the shape of 
the relation between the variables. 


EXERCISES 


1 What type of correlation coefficient is appropriate to describe the rela- 
tion between psychological-test scores and (a) sex, (ft) age, (c) a pass- 
fail criterion? 


2 The following are data on the correlation between responses to two 
test items: 


Item 2 



Fail 

Pasr< 


Pass 

! 

| 30 

70 

Fail 

20 

10 

30 


60 

40 

100 


Com* .:fce the phi coefficient. 

3 Compute for the data of Exercise 2 above the maximum and minimum 
values of phi. 




250 


Other varieties of correlation 


chap. 15 


4 The following arc data on the correlation between test scores and 
responses on a test item: 


Class 

interval 

Item 


Fail 

Pass 

30-34 


1 

25-29 

1 

2 

20-24 

6 

15 

15-19 

12 

30 

10-14 

15 

18 

5-9 

31 

10 

0 4 

7 

4 

Total 

72 

80 


Compute both the point biserial and the biscrial correlation coefficients 
for these data. 

5 Dichotomize the test scores in Exercise 4 above to obtain a 2 X 2 
table and calculate a tetrachoric correlation coefficient using the 
cosine-pi formula. 

6 In what way does the correlation ratio deal with Ihe problem of non- 
linearity of regression? 

7 Is r) 2 = .70 for k = 10 and N - 50 significantly different from zero? 



Transformations: 
Their Nature 
and Purpose 



.I Introduction 


Many varieties of transformations are used in the interpretation and 
analysis of statistical data. A transformation is any systematic altera- 
tion in a set of observations whereby certain characteristics of the set are 
changed and other characteristics remain unchanged. The representa- 
tion of a set of observations X as deviations from the mean A" — A' = x 
is a simple transformation. The mean of the transformed value is zero. 
All other characteristics of the transformed values an* the same as those 
of the original values. The variability, skewness, and kurtosis remain 
unchanged. The ordinal properties of the data are preserved. The rank 
ordering of the observations is the same as before. The transformation 
of a variable X to standard-score form ( X — X)/s = z results in a change 
both in mean and standard deviation. The mean of the transformed 
values is zero, and the standard deviation is unity. Skewness, kurtosis, 
and rank order are unchanged. 

Certain commonly used transformations change the shape of the fre- 
quency distribution of the \ariable. The variable may, for example, be 
transformed to the normal iorm. This may imolve not only a change in 
mean and standard deviation, but also a change in skewness and kurtosis. 
The original observations may be negatively skewed and leptokurtic. 
The transformed values may be normally distributed, or approximately 
so. This type of transformation dues not change the rank order of the 
observations. The transfoi nations most commonly used by psycholo- 
gists that alter the shape of the frequency distribution are to the normal 
and rectangular forms. The conversion of a set of observations to per- 
centile ranks is a transformation to a rectangular distribution. 

The conversion of a set of frequencies f h / 2 , /a, . . . , /* to propor- 
tions b$ dividing each frequency by N, or to percentages by dividing by 
N and multiplying by 100, is a simple transformation. The ordering of 
the transformed values is the same as the ordering of the original fre- 
quencies. If each frequency is divided by different values of N, say 



*52 


Transformations: their nature and purpose 


chap. 16 


N i, N 2 , Nt, . . . , AT*, then the transformed values 'will quite probably 
have an order different from the original values. The conversion of a 
mental age to an intelligence quotient by dividing by chronological age 
and multiplying by 100 is a transformation which changes the ordinal 
properties of the data. In converting mental ages to intelligence quo- 
tients, not only is the order changed, but also the mean, standard devia- 
tion, skewness, and kurtosis. The transformed values have a mean of 
100 in the standardization group and are approximately normally dis- 
tributed with a known standard deviation. 

Transformations are used for a variety of reasons. The use of trans- 
formed values may assist understanding and algebraic manipulation. 
The correlation coefficient, for example, may be written as 

_ Z( X - X) (Y - Y) 

(N — l)s*$ v 

Transformed to standard measure it becomes r = 2z x z y /(N - 1). The 
correlation coefficient is observed under these circumstances to be a func- 
tion of the standard scores. This means in effect that the original values 
X and Y may be transformed by the addition of constants^ thus changing 
the means X and F, and by multiplying by constants, thus changing the 
standard deviations s x and s Vy and the correlation coefficient remains 
unchanged. The correlation coefficient may be said to be independent of, 
or invariant under, transformations which involve adding or multiplying 
the variate values by constant factors. In ordinary algebraic work it is 
usually easier to manipulate standard scores than the original observa- 
tions. In computation considerable use is made of transformed values. 
For example, in calculating a mean from grouped data a computation 
variable may be used which is a deviation from an arbitrary origin in 
units of class interval. The mean of this variable is calculated, and a 
simple formula applied to convert this mean back to the mean of the 
original observations. The purpose of this is to save arithmetic. 

In forms of correlational analysis, involving a number of variables, 
the distributions of the variables may assume a variety of shapes. Some 
may be negatively and other positively skewed. Some may be platy- 
kurtic, and other leptokurtic. If correlation coefficients are computed, 
these coefficients will not be altogether independent of the differences in 
the shapes of the distributions. To achieve comparability it is a common 
practice to transform all variables to an approximately normal form and 
compute the correlations on transformed values. Such transformations 
may also have the related effect of improving the fit of linear regression 
lines to the data. 

Raw scores on psychological tests are usually highly arbitrary. The 



sec. x6.i 


Introduction 


253 


values of the mean, standard deviation, and possible range of scores reside 
in large measure in the predilection of the test constructor. Unless the 
mean, standard deviation, and something about the shape of the score 
distribution arc known, no proper interpretation can be attached to the 
original, or raw, scores. Such scores are frequently transformed to nor- 
mal distributions with an agreed mean and standard deviation. For 
example, a psychological test when administered to a representative 
sample of individuals from the population for which the test is intended 
may have a mean of 37 and a standard deviation of 9.6 and be positively 
skewed. Scores may be transformed to a normally distributed variable 
with a mean of 100 and a standard deviation of 16. Scores thus trans- 
formed to the normal form immediately take on meaning. If an indi- 
vidual has a score of 116, we know that he is one standard deviation unit 
above the average. Because the scores are normally distributed we know 
that his perlormance is bettor than that of about 84 per cent of the popu- 
lation and below the performance of about 16 per cent of the population. 
The procedure for developing such a transformation is known as stand- 
ardization. A psychological test is said to be standardized when trans- 
formed scores are available, based on a reference group of acceptable size. 
The transformed scores themselves are called norms. An individual’s 
score* takes on meaning in relation to a standard, or normative, group. 
Tests are frequently standardized to permit age allowances. This means 
in effect that separate norms have been prepared for each age group. The 
average child in each age group may have a mean transformed score of, 
say, 100. The standard deviation of scores for each age group may be 16. 
Thus a younger child may make a lower raw score than an older child but 
have a considerably higher transformed score. Intelligence quotients are 
transformed scores which make adjustments for the differing chrono- 
logical ages of children taking the test. Intelligence quotients are pre- 
sumed to be independent of chronological age within an accepted age 
range. Most published tests are accompanied by manuals containing 
conversion tables which permit the transformation of raw scores to 
standardized scores. Both normal and rectangular transformations are 
used in test standardizatio • 

The application of a t test for the significance of the difference 
between two means assumes normality and equality of variance of the 
population distributions. The same assumptions underlie the use of the 
analysis of variance. In practice, data are often encountered which 
depart mipreciably from the normal form and with unequal variances. 
Here the investigator has several avenues open to him. If the departures 
from normality and equality of variance are not too gross, he may apply 
the usual procedures, knowing that the data do not satisfy the assump- 



*54 


Transformations: their nature and purpose 


chap. 1 6 


tions required, and impose upon himself a more rigorous level of sig- 
nificance. Fairly marked departures from normality may occur, and 
the tests of significance will not be too seriously affected. Where the 
departures from normality and equality of variance are gross, a trans- 
formation is sometimes used. Square-root, and logarithmic transforma- 
tions are appropriate to certain classes of data. A square-root trans- 
formation converts X to \/X; a logarithmic transformation converts 
X to log X . Under the special circumstances where they are appropriate, 
these transformations may achieve approximate normality and equality 
of variance. 

An example of the practical utility of a transformation is Fisher's z T 
transformation used in tests of significance of correlation coefficients, 
described in Chap. 12. The variance ami shape of the sampling dis- 
tribution of the correlation coefficient vary as a function of the popula- 
tion value p. The transformed values z T are approximately normally 
distributed and nearly independent of p with a standard deviation close 
to l/v/jn. 

Tests of significance may be applied which are independent of the 
shapes of the population distributions. These tests are known as dis- 
tribution-free, or nonparamotric, tests (('hap. 22). Suclf te.sts in effect 
transform the original measurements to ranks or signs. A rank trans- 
formation simply converts measurements to the integers 1 , 2, . . . , N. 

Subsequent calculation and interpretation are based on these integers. 
A sign transformation converts the measurements to plus and minus 
signs. Observations above a median value may be assigned a plu «, and 
those below a minus. The reduction of data to their rank and sign prop- 
erties leads to a loss of information. More observations are required to 
achieve significance at an accepted level of significance. 

The reader will observe that a persisting theme underlying many 
transformations is independence, or invariance. By transforming the 
original observations to standard measure, or the equivalent, meaningful 
comparisons between variables may be made which are independent of 
the means and standard deviations. By transforming the original 
observations to a model distribution, perhaps normal or rectangular, 
comparisons may be made which arc independent of the idiosyncratic 
shapes of the original distributions. The transformation of original 
measurements to intelligence quotients results in a variable which is 
roughly independent of chronological age. Meaningful comparisons may 
thereby be made between children of different ages. The z r values 
obtained by Fisher's transformation are independent of the population 
parameter p. The reduction of data in non parametric statistics to ranks 
and signs leads, at some cost, to tests of significance which are inde- 



sec. 16.2 


Transformations to standard measure 


255 


pendent of the shapes of the population distributions. Clearly, the 
essence of the idea of a transformation is the attainment of a variable 
which is independent of, or invariant with respect to, certain other 
variable properties for the purpose of achieving desired and meaningful 
comparisons. 


16.2 

Transformations to standard measure 

A standard score is a deviation from the mean divided by the standard 
deviation; thus z ~ ( X A ) .s. The mean is the origin, and the stand- 
ard deviation is the unit of measurement. Thus a particular value is 
2 standard deviation units above or below the mean. The mean of z 
scores is zero, and the standard deviation is unity. The skewness and 
kurtosis of the distribution an' unchanged. The distribution of z scores 
have 1 the same shape as the distribution of Ah Standard scores on two or 
more variables are directly comparable only in the sense that they have 
the same mean and standard deviation. 

A standard-score transformation does not change the proportionality 
of scale intervals. If A\ X 2 , and X 3 are three measurements in raw-score 
form and z 1 , z 2 , ami z A are the same three measurements in standard-score 
form, then 

Ah - Y 2 = 2! - *2 

A 2 — A3 Z 2 — 2.3 

This means that the illative distances between the variate values remain 
unchanged under a standard-score transformation. Let Ah, A!h, and X* 
be 20, 30, and 50 If A’ - 40 and a = 15, then z h and 23 heroine 
— 1.33, —.07, and .67. \\ note that 

20 — 30 - 1 33 + .67 ^ 

30 50 ’ - .67 - .67 

Standard scores involve the use of decimals and plus and minus signs. 
This is sometimes inconvenient. Also the range of values will seldom 
exceed the limits —3 and .5. It is not uncommon to select an arbi- 
trary origin and standard deviation to ensure that all, or nearly all, the 
measures have a plus sign and that decimals arc eliminated. For this 
purpose a mean of 50 and a standard deviation of 15 are sometimes used. 
If z ' denotes this typo of score, then 

.-'-50 + IS (£ 7 -*) 

= 50 + 15s 



256 


Transformations: their nature and purpose 


chap. 16 


To change the standard deviation we multiply every standard score by 1 0 . 
To change the origin we merely add 50. Values of z ' are rounded to the 
nearest integer. In comparing performance on a series of tests standard- 
score values z ' are more convenient than z. Of course, any other mean 
and standard deviation could be selected. 

16.3 

Percentile points and percentile ranks 

In the standardization of psychological tests transformations to percentile 
ranks have frequent application. Such transformations are rectangular. 
Each percentile rank has the same frequency of occurrence. The fre- 
quency distribution is flat. 

A clear distinction must be made between percentile points and 
percentile ranks. If k per cent of the members of a sample have scores 
less than a particular value, that value is the A:th percentile point. It is a 
value of the variable below which k per cent of individuals lie. On an 
examination, if 85 per cent of individuals score less than 00, then 60 is 
the 85th percentile point. If a frequency distribution is represented 
graphically and ordinates raised at all percentile points, ^the total area 
under the frequency distribution is divided into 100 equal parts. 

Percentile points may be represented by the symbols 7%, P h P 2 , 
. . . , P 100 . The points P 0 and /Vo are limits which include all members 
of the sample. A percentile rank, as distinct from a percentile point, 
is a value on the transformed scale corresponding to the percentile point. 
If 60 is a score below which 85 per cent of individuals fall, then 85 is the 
corresponding percentile rank. As in all transformations, values on the 
original scale correspond to ( ertain values on the transformed scale. In 
the present context the values on the original scale are percentile points, 
the corresponding values on the transformed scale are percentile ranks. 

The reader will recall that the median is a value of the variable above 
and below which 50 per cent of cases lie. The median is the 50th per- 
centile point, P 6 0 . The upper quartile is a value of the variable above 
which 25 per cent of cases and below which 75 per cent of cases lie; con- 
versely for the lower quartile. The upper quartile is the 75th percentile 
point, or P 76 , and the lower quartile is the 25th percentile point, or Pi*. 
Decile points are sometimes used. These, as the name implies, involve 
a dividing into tenths. A decile point is a value of the variable below 
which a certain percentage of individuals fall, the percentage being taken 
in units of 10. Decile ranks are transformed values corresponding to 
the decile points and taking the integer values 1 to 10. The median is 
the 5th decile. An ordinate at the median divides the area under the 



sec. 16.4 


Computation of percentile points and ranks 


257 


frequency distribution into 2 equal parts; ordinates at the upper quartile, 
the median , and the lower quartile divide the area into 4 equal parts; 
ordinates at the decile points divide the area into 10 equal parts; ordinates 
at the percentile points divide the area into 100 equal parts. 

For small N the computation of percentile points and percentile 
ranks is not a very meaningful procedure. Given the scorns 8, 17, 23, 42, 
61, and 63, obviously little meaning could possibly attach to 7 J 30 or P s 0 . 
The conversion of these scores to percentile ranks would be a somewhat 
spurious procedure, with no advantage over ordinary ranks. 

16.4 

Computation of percentile points and ranks- 
ungrouped data 

To illustrate the computation of percentile points and ranks for ungrouped 
data, consider the psychological-test scores tabulated in Table 16.1. We 


Table 16. 1 

Psychological-test scores for a group of 60 children 
arranged in order 


Individual 

Spot p 

lndi\ idual 

Score 

Individual 

Score 

1 

83 

21 

110 

41 

123 

2 

88 

22 

110 

42 

124 

3 

S8 

23 

110 

43 

124 

4 

91 

24 

110 

44 

125 

5 

91 

25 

111 

45 

125 

c 

93 

26 

112 

46 

125 

7 

93 

07 

114 

47 

126 

s 

93 

28 

115 

48 

126 

9 

97 

29 

116 

49 

127 

10 

98 

30 

116 

50 

128 

11 

98 

31 

116 

51 

130 

12 

98 

32 

117 

52 

130 

13 

100 

33 

118 

53 

131 

1 4 

101 

34 

119 

54 

132 

15 

103 

35 

120 

55 

135 

16 

107 

36 

121 

56 

135 

17 

107 

37 

122 

57 

136 

18 

108 

38 

123 

58 

136 

19 

109 

39 

123 

59 

136 

20 

110 

40 

123 

60 

139 


258 


Transformations: their nature and purpose 


chap. 16 


adopt the convention that any score value X has exact limits given by 
X — .5 and A" + .5. The variable is presumed to be continuous. Thus 
the score 116 has exact limits 115.5 and 116.5. This convention is the 
same as that used in determining the exact limits of class intervals. Let 
us now calculate / J 40 , the 40th percentile point, the point below which 
40 per cent of individuals lie. N = 60, and 40 per cent of this is 24. 
The 24th individual has a score of 110, the exact upper limit is 110.5, 
and this is taken as the point on the test scale below which 24 individuals 
lie. Thus P 4 0 = 110.5. Note in this case that the 25th individual has 
a score of 111. The exact lower limit of this score is also 110.5. Con- 
sider now the calculation of P 20 . We require a point on the test scale 
below which 12 and above which 48 individuals lie. The score of the 
12th individual is 98 with an upper limit of 98.5. We note also that the 
score of the 15th individual is 100 with a lower limit of 99.5. Presumably 
the median falls somewhere between 98.5 and 9'» V It is indeterminate. 
As an arbitrary working procedure the percentile P>* 0 may be taken half- 
way between these two values. Thus P 2 o = 99.0. To illustrate the 
handling of ties in the computation of percentile points let us calculate 
P 10. A score is required below which 6 and above which 54 individuals 
fall. We note that individuals 6, 7, and 8 have the same score, 95. 
Thus three individuals have scores within the exact limits 92.5 and 95.5 
Since we require a point below which 0 individuals fall, we interpolate 
one-third of the way into this interval. One-third of this interval is .55, 
and Pm = 92.50 -+ .55 = 92.85. With the above data Pn may be taken 
as the lower exact limit, of the lowest score, or 82.5. Similarly, Pmu may 
be taken as the upper exact limit of the highest score, or 159.5. 

The calculation of percentile ranks as distinct from percentile points 
is the reverse of the above process. Above we calculated scores cor- 
responding to particular ranks. We may now attend to the calculation 
of ranks corresponding to particular scores. To illustrate, consider 
individual 52 in Table 16.1. This individual is 52d from the bottom. 
His test score is 117. The number of individuals scoring below 117 is 51. 
The percentage below is X 100 = 51.67. The number scoring above 
117 is 28. The percentage is X 100 = 46.67. These two percent- 
ages do not add to 100. Individual 52 occupies ^ X 100 = 1 67 per cent 
of the total scale. His percentile rank falls between 51.67 and 

51.67 +■ 1.67 = 53.35 

Wc may take the mid-point of this interval as the required percentile 
rank. Thus the percentile rank corresponding to score 117 is 

51.67 + = 52.50 



sec. x6.5 


Calculation of percentile points and ranks 


2S9 


This method assumes that any rank R rovers the interval R — .5 and 
R + .5. 

Consider the question of ties. We note that five individuals score 
110. The number of individuals scoring below 110 is 19, or 

ff® X 100 = 31.67 per cent 

of the total. The number scoring above 1 10 is 36, or 
X 100 — 60.00 per cent 

The number occupying the score position 1 10 is 5, or ^ X 100 « 8.33 per 
cent. The required percentile rank may be taken as the mid-point of the 
interval 31.67, and 31.67 + 8.33 = 40.00. Thus the percentile lank of 
the score 110 is 31.67 + 8.33 '2 = 35.83. 

Percentile ranks may be obtained by using the simple formula 

PR = 100 — ~r-~ 16.1 

where R = rank of individual, counting from the bottom 
N = total number of cases 

Where ties occur, R is taken as the average rank which the tied observa- 
tions occupy. The average rank of the five individuals who **eorc 110 is 
22, and the corresponding percentile rank is, as before, 

22 — r i 

100 ~^ 0 - = 35.83 

Percentile ranks are ordinarily rounded to the nearest whole number. 
Thus the rank 35.83 becomes 36. 

16.5 ' 

Calculation of percentile points and ranks — 
grouped data 

The calculation of percentile points and ranks for grouped data will he 
discussed with reference 4 «> the data of Table 16.2. Cumulative fre- 
quencies are recorded in col. 3, and cumulative percentages in col. 4. 

Let us calculate N = 200, and 25 per cent of N is 50. We 

observe that the 50th case falls within the interval 65 to 69. The exact 
limits of this interval are 64 5 and 69.5. We must now interpolate within 
the int* r val to locate a point below which 50 cases fall. We note that 
36 cases fall below and 26 cases within the interval containing P 26 . To 
arrive at the 50th case we require 14 of the eases within the interval. 
Thus we take of the interval 64.5 to 69.5. This is X 5 = 2.69. 



26 o 


Transformations : their nature and purpose 


chap. 1 6 


We add this to the lower limit of the interval to obtain 5 , which is 
64.5 + 2.69 = 67.19. 

The following formula may be used to calculate percentile points: 


P. = L + pN —f X h 

J * 


16.2 


where P t = /cth percentile point 

p = proportion corresponding to ith percentile 
point; thus if i = 62, p = .62 
L = exact lower limit of interval containing P t 
F = sum of all frequencies below L 
ft = frequency of interval containing P t 
h = class interval 

For 7*26 in Table 16.2 we have L = 64.5, p t = .25, F = 36, /, = 26, and 
h = 5. Thus 

P d r . - 2r > X 200 “ 36 v r C7 in 

P 25 = 64.5 H X 5 = 67.19 

Zo 


This result is identical with that obtained previously for P 25 . The reader 
will observe that for P „ 0 this formula is the same as that given previously 
for calculating the median from grouped data. 


Table 16.2 

Cumulative frequencies and percentages of test scores 


I 

Clash 

interval 

2 

Frequenc> 

3 

Cumulative 

frequency 

4 

Cumulative 

percentage 

95-99 

1 

200 

100 0 

90-94 

6 

199 

99 5 

85-89 

8 

193 

96 5 

80-84 

33 

185 

92 5 

75-79 

40 

152 

76 0 

70-74 

50 

112 

56 0 

65-69 

26 

62 

31 0 

60-64 

14 

36 

18 0 

55-59 

10 

22 

11 0 

50-54 

6 

12 

6 0 

45-49 

4 

6 

3 0 

40-44 

2 

2 

1 0 

Total 

200 





sec. 16.5 


Calculation of percentile points and ranks 


261 


The calculation of percentile ranks is the reverse of the above pro- 
cedure. The cumulative percentages shown in col. 4 of Table 16.2 are 
the percentile ranks corresponding to the exact top limits of the intervals. 
Thus 56.0 is the percentile rank corresponding to the percentile point 
74.5, the exact top limit of the interval 70 to 74. Likewise 11.0 is the 
percentile rank corresponding to the percentile point 59.5, the exact top 
limit of the interval 55 to 59. The percentile rank of any score may be 
obtained by interpolation. What is the percentile rank corresponding to 
the score 81 ? The score 81 falls within an interval with exact limits 

79.5 and 84.5. It is 1.5 score units above the bottom of this interval. 
The lower limit has a percentile rank of 76.0 and the upper limit 92.5. 
Thus we have two points on the score scale corresponding to two points 
on the percentile-rank scale. Five units on the score scale is equal to 

92.5 — 70.0 = 16.5 units on the percentile-rank scale, and 1.5 units on 
the score scale is equal to (92.5 — 76.0)1.5/5 = 4.9. > units on the rank 
scale. We now take 76.0 + 4.95 = 80.95 as the percentile rank of the 
score 81. Rounding this to the nearest integer we obtain a rank of 81. 
It is pure coincidence that in this case the percentile rank is numerically 
equal to the score. 

The steps involved in finding percentile ranks from grouped data 
may be summarized as follows: 

1 Find the exact lower limit of the interval containing 
the score X whose percentile rank is required. 

2 Find the difference between X and the lower limit of 
the interval containing it. 

3 divide this by the class interval and multiply by the 
percentage within the interval. 

4 Add thF to the percentile rank corresponding to the 
bottom of the interval. 

Usually, where percentile ranks are calculated we are interested in 
preparing a table for converting any score value to pen*?ntile ranks 
Thus for every possible score, we require the corresponding percentile 
rank. This may be done bv systematically computing all jj^rcen tile- 
rank values in the manner described above. A somewhat easier pro- 
cedure is to make a graphical plotting on suitable graph paper of cumula- 
tive percentages against the corresponding upper limits of the class 
intervals. Score values are plotted on the horizontal axis, and cumula- 
tive per outages on the vertical axis. The points may be joined by 
straight lines. Percentile ranks corresponding to scores may then be 
read directly from the graph. If the points are joined by straight lines, 
these rank values will be the same, within limits of error, as those obtained 



262 


Transformations : their nature and purpose 


chap. 16 


by linear interpolation directly on the numerical values. If the sample is 
small, the points when plotted may show considerable irregularity and 
it may be advisable to fit a smoothed curve to the data. The fitting of a 
smoothed curve by freehand methods is accurate enough for most 
practical purposes. A procedure related to the method described above 
is to calculate certain selected percentile points and then interpolate either 
numerically or graphically between these points. The percentile points 
Pi* P*o f Pio, . . . , P*o may be calculated. To achieve greater accuracy 
at the tails of the distribution it may be desirable to calculate P h and P b ; 
also P 95 and P 99 . 


16.6 

Normal transformations 

The transformation of a variable to the normal form is a frequent pro- 
cedure in test standardization and correlational analysis. Not uncom- 
monly, test norms are normal transformations of the original raw scores 
with arbitrarily selected means and standard deviations. A type of 
normal transformation used by educationists is a T score. T scores are 
normally distributed, usually with a mean of 50 and a standard deviation 
of 10. A normal transformation with a mean of 100 and a standard 
deviation of 15, or thereabouts, resembles an IQ scale. 


Table 16.3 

Points on the base line of the unit normal curve 
corresponding to selected percentile ranks 


Percentile 

rank 

99 

95 

90 

HO 

70 

60 

50 

40 

30 

20 


Standard 
deviation 
42 33 
4-1 65 
4 i 28 
-f 0 84 
+0 52 

40 25 
0 00 
-0 25 
-0 52 
- 0.84 


10 -1 28 

5 -1 65 

1 - 2.33 



sec. 16.6 


Normal transformations 


263 


Transforming a set of scores to the normal form is a relatively simple 
procedure. Every percentile rank corresponds to a point on the base line 
of the unit normal curve measured from a mean of zero in standard devia- 
tion units. A percentile rank of 50 corresponds to the zero point. A 
rank of 60 is .25 standard deviation units above the mean. A rank of 
70 is .52 standard deviation units above the mean. Table 16.3 shows 
{joints on the base lino of the unit normal curve corresponding to selected 
percentile ranks. These and other points are readily obtained from any 
table of areas under the normal curve (Table A of the Appendix). 

In summary, the steps used in transforming a variable to the normal 
form are as follows. Percentile ranks corresponding to certain points on 
the score scale may be calculated. A table of areas under the normal 
curve is used to find the points 011 the base line of the unit normal curve 
corresponding to these percentile ranks. These points correspond to the 
percentile points on the original store scale. Tliu> a correspondence is 
established between a set of points on the original score scale and {joints 
on a normal distribution of zero mean and unit standard deviation. 
Percentile ranks are stepping^ one'* in establishing this correspondence. 
The normal standard scores are multiplied by a constant to obtain any 
desired standard deviation of the transformed values. A constant is 
usually added to produce a change in means, thus eliminating negative 
signs. A transformed value corresponding to any score value on the 
original scale may be obtained by interpolation. 

Some freedom of choice is possible in the selection of a set of points 
on the score scale with associated percentile ranks. First, v e may use the 
exact top limits of the intervals and obtain the corresponding percentile 
ranks from the cumulative-percentage frequencies. Stroud, we may take 
the mid-points of the class intervals and obtain percentile ranks cor- 
responding to these. Thud, we may use a selected set of percentile {joints 
with associated percentile ranks. Thus Pio, P 20, P30. . . • , Pw niay be 
used. P\, 1 } 5 and 1\ 5, Pm may be added at the tails a refinement. 
Fourth , we may select certain equally spaced points on the normal stand- 
ard-score scale and ascertain their { ereentile ranks and the corresponding 
percentile-point scores. The^c equally spaced {joints may, for example, 
he -2 5, —2.0, -1.5, . . . , +1.3, +2.0, +2.5. The difference between 
the four alternatives outlined abo\e is a matter of units. The first uses 
units of elass interval of the original variable, a unit extending from the 
top of one interv al to the top of the next. The second also uses units of 
class imerval of the original variable, a unit extending from the mid-point 
of one interval to the mid-point of the next. The third, excluding the 
tails, uses equal units on the percentile-rank scale. The fourth alterna- 
tive uses equal units on the normal standard-score scale. While minor 



264 


Transformations: their nature and purpose 


chap. r6 


advantages may be claimed for one procedure in preference to another, 
the differences, where N is fairly large, are trivial. Any one of the four 
procedures is satisfactory enough for most practical purposes. 

To illustrate the transformation of a set of scores to the normal form 
we shall use the second alternative and take the mid-points of the class 
intervals with their corresponding percentile ranks. Table 16.4 shows a 
frequency distribution of test scores. Column 2 shows the exact mid- 
points of the class intervals. Column 3 shows the frequencies. Column 
4 shows the cumulative frequencies to the mid-points. These are the 
cumulative frequencies to the bottom of the interval plus half the fre- 
quencies within the interval. The number of cases below the interval 
60 to 64 is 22. The number within the interval is 14. Half this number 
is 7. The cumulative frequency to the mid-point is 22 + 7 = 29. 
Column 5 shows the cumulative percentage frequencies to the mid-points. 
These cumulative percentage frequencies are percentile ranks correspond- 
ing to the mid-points of the intervals. The numbers in col. 6 are points 
on the base line of the unit normal curve in standard deviation units 


Table 16.4 

Illustration of the transformation of scores to a normal 
distribution— data of Table 16.2 





Cumulative 

Cumulative 

Normal 

T score 


Class 

Mid- 

Frequency 

frequency 
to mid- 

percentage 
to mid- 

standard 



interval 

point 

deviation 

2 X 10 

2 X 10 




point 

point 

unit z 

+ 50 

1 

2 

3 

4 

5 

6 

7 

8 

95-99 

97 

1 

199 5 

99 75 

2 81 

28 1 

78 1 

90-94 

92 

6 

196 0 

98 00 

2 06 

20 6 

70 6 

85-89 

87 

8 

189 0 

89 50 

1 25 

12 5 

62 5 

80-84 

82 

33 

168 5 

84 25 

1 00 

10 0 

60 0 

75-79 

77 

40 

132 0 

06 00 

41 

4 1 

54 1 

70-74 

72 

! 50 

87 0 

43 50 

- 16 

-1 6 

48.4 

65-69 

67 

26 

49 0 

24 50 

- 09 

-6 9 

43 1 

60-64 

62 

14 

29 0 

14 50 

-1 06 

-10 6 

39 4 

55-59 

57 

10 


8 50 

-1 37 

-13 7 

36 3 

50-54 

52 

6 


4 50 

-1.70 

-17 0 

33 0 

45-49 

47 

4 


2 00 

-2 05 

-20 5 

29 5 

40-44 

42 

2 


50 

-2 58 

-25 8 

24 2 

Total 


200 


1 







sec. 16.7 


The stanine scale 


265 


from a zero mean. The percentage of the area of the unit normal curve 
falling below a standard score of 2.81 is 99.75, the percentage below a 
standard score of 2.00 is 98.00, and so on. These values are normalized 
standard scores corresponding to the mid-points of the original score 
intervals. Thus we have a set of values on the original scale paired with 
a set of values on a normal transformed scale. Transformed values cor- 
responding to any score on the original scale may be obtained by either 
arithmetical or graphical interpolation. 

Table 1G.4 shows a T-scorc transformation. In col. 7 the standard 
scores of col. 6 are multiplied by 10, thus yielding transformed scores with 
a standard deviation of 10. In col. 8 a constant value 50 is added to the 
values of col. 7, thus changing the origin from zoio to 50 and eliminating 
negative values. If we had multiplied by 15 and added 100, the trans- 
formed values w r ould have a standard deviation of 15 and a mean of 100. 
Any other standard deviation and mean could be used 


16.7 

The stanine scale 

During World War II the United States Army Air Force Aviation Psy- 
chology Program used a stanine ^cale. Scores on psychological tests v ere 
converted to stanines. A stanine scale is an approximately normal 
transformation. A coarse grouping is used, only nine score categories 
being allowed. The transformed values are assigned the integeis 1 to 9. 
The mean of a stanine scale is 5, and the standard deviation is 1.90. The 
percentage of cases .11 the stanme-score categories from 1 to 9 are 4, 7, 12, 
17, 20, 17, 12, 7, and 4. Thus 4 per cent have a stanine score 1, 7 per 
cent a score 2, 12 per cent a *ore 3, and so on. If a set of scores is ordered 
from the lowest to the highest, the lowest 4 pe r cent assigned a score 1, 
the next lowest a score 2, the next lowest a score 3, and the process con- 
tinued until the lop 4 per cent receives a score of 9, the transformed scores 
are roughly normal and form a stanine scale. Stanine scores correspond 
to equal intervals in standard deviation units on the base line of the unit 
normal curve. A stanine o r 5 covers the interval from —.25 to +.25 in 
standard deviation units. Roughly 20 per cent of the area of the unit 
normal curve falls within this interval. A stanine of G covers the interval 
+ .25 to +.75 in standard deviation units. Roughly 17 per cent of the 
area of the unit normal curve falls within this interval The interval 
used is o» o-half a standard deviation unit; a stanine of 9 includes all cases 
above 4 2.25, and a stanine of 1 all cases below —2.25 standard deviation 
units. Test scores can rapidly be converted to stanines. A stanine 
transformation is a simple method of converting scores to an approximate 



266 


Transformations : their nature and purpose 


chap. 1 6 


normal form. The grouping, although coarse, is sufficiently refined for 
many practical purposes. 

16.8 

Regression transformations 

The data resulting from certain psychological experiments are comprised 
of a set of initial measurements, obtained in the absence of an experi- 
mental treatment, and a set of subsequent measurements obtained on the 
same subject'* in the presence of an experimental treatment. These 
latter measurements are a function both of the initial measurements and 
the effects of the experimental treatment. The investigator may wish to 
transform the measurements obtained under the treatment to a new 
variable which is independent of the initial measurements, the trans- 
formed variable being the object of further analysis. To illustrate, meas- 
ures of motor performance may be obtained both in the absence and the 
presence of a stress agent. The scores obtained under stress conditions 
are not independent of the initial scores. A person may have a low score 
under stress because his initial level of motor performance is low, or he 
may have a high score because his initial level is high, qifite apart from the 
effects of the stress agent. We require a transformation that renun es the 
effect of the initial values. The variation in the transformed measure- 
ments is presumably the result of the stress agent, the effects of initial 
level of performance being removed. 

Various approaches to this problem have been used. Some investi- 
gators have employed difference scores, the presumption here being that 
the increase or decrease in score over the initial value must result from the 
experimental treatment. Other investigators have used ratio scores. 
These methods do not achieve independence with respect to initial values. 
A straightforward approach to this problem is to remove the effects of 
initial values by simple linear regression, assuming of course that a linear- 
regression model is appropriate to the data. 

Let Xo and X\ be scores obtained under the two conditions. Let zq 
and Zi be the corresponding standard scores. The regression equation for 
predicting z x from z 0 is z[ = r 0 iZo, where r 0 i is the correlation between meas- 
ures obtained under the two conditions, and z\ is a standard score pre- 
dicted from the initial values. The values z[ are points on the regression 
line used in predicting z\ from z 0 . The difference between z\ and z[ is a 
deviation from the regression line and may be written as z\ — r 0 iZo. 
These deviations are transformed values which are quite independent of 
the initial values. The effect of initial performance level has been 
removed. The variation in the transformed values results from the 



sec. 16.9 


Transformations with age allowances 


267 


experimental condition plus error. Of course, in any practical situation 
the data may be contaminated by other factors unless adequate controls 
are exercised. 

The scores z y — roi^o arc errors of estimation with zero mean and a 
standard deviation given by Vl - r 0 i 2 . They ma y he expressed in 
standard-score form by writing 


zi — r 0 izo 
Vl - roi* 


16.3 


In this form they may be referred to as 5 scores, or dtdta scores. These 
transformed scores have a mean of zero and a standard deviation of unity. 
Their skewness and kurtosis arc not a simple matter. Such scores may be 
multiplied by a constant to obtain any desired standard deviation. Any 
constant may bo added to change the mean. 

This type of simple regression transformation is quite general and is 
applicable in many situations when* we wish to remove the effects of one 
variable on another. It ha^ been used effectively by Lacey (195(>) in the 
statistical treatment of autonomic-response data. 


16.9 

Transformations with age allowances 

Any detailed consideration of a score transformation with age allowances 
is beyond the scope of this book A few comments may, however, be 
appropriate. This transformation is a variant of the regression trans- 
formation described in the previous section. Its purpose i* to achieve 
comparability between children of different ages b> transforming to a 
variable which is imlepcm! nt of chronological age. An older child A 
may have greater ability t»*an a younger child B. Relative to his age 
group, however, his ability may be appreciably less. We require an 
answer to the question, how 7 would child A compare with child B if both 
were the same age? This question is answered by a transformation to 
a variable which is independent of chronological age Age transforma- 
tions usually incorporate a ' o' malizing process The transformed scores 
are normally distributed with a fixed moan and standard deviation. 

Such a transformation may be effected in a variety of ways. One 
method involves the following general >teps. Obtain the frequency dis- 
tributions of scores for each age group. If the age group is restricted to 
1 year, f- ^ 11, a frequency distribution may be prepared for each month 
of age, 11 years 0 months, 1 J years 1 month, 1 1 years 2 months, and so on. 
For a group covering a wider age range, 3- or ti-month intervals may be 
used. The next step is to compute certain selected percentile points for 



268 


Transformations : their nature and purpose 


chap. 1 6 


each frequency distribution. Let us calculate the 5th, 16th, 50th, 84th, 
and 95th percentile points. These percentiles correspond roughly to the 
points on the base line of the unit normal curve of —1.65, —1.00, 0.00, 
+ 1.00, and +1.65. If greater accuracy is required, additional percentile 
points may be calculated. We thus have 5 percentile points for each age 
group. We may now fit lines to these percentile points, using cither 
mathematical or graphical methods. Thus we fit a line to all the 5th- 
percentile points, anothei line to the 16th-percentile points, another line 
to the 50th-percentile points, and so on. These lines describe the increase 
in score with increase in age at each percentile-rank level. For a fairly 
narrow age range a straight line may prove an adequate fit to the data. 
Such a line may be fitted by the method of least squares. For a broad 
age range the lines may exhibit certain curvilinear properties and it may 
be advisable to fit a smooth curve to the points using graphical methods. 
These percentile lines smooth out irregularities in the data. The original 
percentile points are replaced by points on these lines. Let us now 
assume that we require a transformed variable with a mean of 100 and a 
standard deviation of 15. All percentile points on the 50th-percentile 
line correspond to a score of 100 on the transformed variable. All per- 
centile points on the 84th-percentile line correspond to a score of 1 1 5 on 
the transformed variable. All points on the 95th-percentile line cor- 
respond to a score of 125 on the transformed variable. Points on the 
5th- and 16th-percentilc lines correspond to scores of 75 and 85 on the 
transformed variable. Thus for each age group we have a set of per- 
centile points, points on a fitted line, and a corresponding set of trans- 
formed values. Ry interpolation and extrapolation a transformed value 
corresponding to each original score value may be obtained and a con- 
version table prepared. 

Transformed scores obtained by this general method will be approxi- 
mately normal with a mean of 100 and a standard deviation of 15. Any 
other appropriate mean and standard deviation may be used. The 
transformed scores are independent of age. The correlation between 
chronological age and transformed score is about zero. 

Many variants and refinements of this general method may be 
applied. Many investigators may prefer to use a larger number of per- 
centile lines and equal standard-score units. 

EXERCISES 

i What characteristics of a set of measurements are invariant under (a) 
a transformation to standard scores, ( b ) a normal transformation, (c) 
a regression transformation? 



Exercises for chapter 16 


269 


2 State the difference between percentile points and percentile ranks. 

3 For the data of Table 16.1, compute (a) percentile points Pm, P 50 , and 
Pn and ( 6 ) percentile ranks for scores 103, 123, and 136. 

4 For the data of Table 16.2, compute (a) percentile points Pi 0 , /’so, 
and P # 0 and (b) percentile ranks for the scores 69, 74, and 82. 

5 Develop a T-score transformation for the data of Table 3.1. 

6 Develop a stanine transformation for the data of Table 3.1. 

7 The following are measures of motor skill under initial nonstress con- 
ditions and subsequent stress conditions for a sample of 12 individuals. 

Nonstress : 26 33 41 63 28 36 44 28 62 47 69 37 

Stress. 18 29 62 40 26 30 38 60 41 39 60 46 

Apply a regression transformation to these data. What purpose 
would such a transformation serve? 



The Structure 
and Planning 
of Experiments 



.1 Introduction 


Several subsequent chapters of this book are concerned with the analysis 
of variance and covariance. These procedures are used in the analysis 
of the data of experiments. A very brief, and elementary, discussion of 
the structure and planning of experiments should serve as a useful pre- 
liminary to a detailed study of these methods of analysis. The study of 
the structure and planning of experiments is a field of investigation com- 
monly called the design of experiments. This subject has many aspects, 
some quite complex. The present discussion will deal* in a non mathe- 
matical way with only a few of the simplest aspects of experimental 
design. 

All experiments are concerned with the relations between variables. 
In the simplest type of experiment two variables only are involved, an 
independent variable and a dependent variable. To illustrate, an 
experiment may be initiated to compare three different methods of teach- 
ing French, designated A } B } and C. Each method may be applied to 
a different group of experimental subjects. Following a period of instruc- 
tion, performance may be measured using an achievement test. In this 
experiment the different methods of teaching French constitute the 
levels of the independent variable. The investigator decides which 
methods will be used and the size of the groups to which they will be 
applied; that is, he controls the values of the independent variable and 
the frequency of occurrence of those values. The measures of achieve- 
ment constitute the dependent variable. The experiment is concerned 
with the way in which achievement in French depends on the method of 
instruction used. The essence of the idea of an experiment lies in the 
simple fact that the investigator selects the values of the treatment varia- 
ble and the frequency of their occurrence. This enables him to study an 
indefinitely large number of relations which are not amenable to study by 
observational or correlational methods, and may not in fact have any 
existence in nature at all prior to the conduct of the experiment. New 



sec. 17.2 


Terminology 


271 


knowledge is thereby produced. The tremendous proliferation of human 
knowledge in the past 20 years is in large measure a result of experimenta- 
tion. It is clearly desirable, therefore, that some of the principles under- 
lying such experimentation should be understood. 

In developing the design for an experiment, the investigator must (1) 
select the values of the independent variable, or variables, to be com- 
pared; (2) select the subjects for the experiment; (3) apply rules or pro- 
cedures whereby subjects are assigned to the particular values of the inde- 
pendent variable; (4) specify the observations or measurements to be 
made on each subject. In the experiment mentioned above, on the 
comparison of different methods of teaching French, the investigator 
must select the different methods of teaching to be compared. He must 
choose the experimental subjects to which the^e methods are applied. 
He must allocate subjects to methods. Also decisions must be made 
regarding appropriate measures of achievement which will yield valid 
comparisons of the methods used. 


17.2 

Terminology 

Comment on terminology is appropriate hen*. An independent variable 
used in an experiment may be either a treatment variable or a classification 
variable. A treatment variable involves a modification in the experi- 
mental subjects, a modi (nation which is controlled by the experimenter. 
Different dosages A a drug or different methods of learning are adminis- 
tered to different groups of subjects. In effect, the subjects are treated 
in some way by the experii .enter. Experimental subjects may, however, 
be classified on a characU ristic which was present prior to, and quite 
apart from, the experiment, and does not result from the* manipulations 
of the experimenter. Such a variable is a classification variable. Exam- 
ples are sex, age, disease entity, IQ level, socioeconomic status, and so ori. 
Although the values of a classification variable are not, a w it were, created 
by the experimenter, as is case with a treatment variable, the investi- 
gator nonetheless selects tin classification variables which are included 
in the experiment or are the objects of attention. 

Any independent variable, whether of the treatment or classifica- 
tion type, is spoken of a** a factor. Experiments which investigate 
simulta* ^ously the effects of two independent variables are spoken of as 
two-fador, or two-way classification, experiments. When three factors 
are involved, the experiment is said to be a three-factor, or three-way 
classification, experiment, and so on. The different values of the inde- 



272 


The structure and planning of experiments 


chap. t7 


pendent variable are spoken of as levels; thus we may have two, or three, 
or more levels of a factor. 

In the literature on experimental design the unit to which a treat- 
ment is applied is frequently spoken of as a plot, a term which derives 
from agricultural experimentation. In experimental work in psychology 
and education the plot is usually a human subject or an experimental 
animal. The term “plot” will be used infrequently in this text. Meas- 
urements obtained from a plot are sometimes spoken of as the yield, a 
term which stems also from agricultural usage. In psychology and 
education the measurements or observations made on a group of subjects 
or animals correspond to the yield for a number of plots. A grouping of 
experimental units which is homogeneous with regard to some basis of 
classification is spoken of as a block. An experimenter may select 50 
subjects, of which 25 are male and 25 female. The two groupings by 
sex are blocks. 


17-3 

The classification of variables in relation 

to experiments 

As indicated above, experiments are concerned with the relations between 
variables. In Chap. 1 of this book variables were classified as nominal, 
ordinal, or interval-ratio types. In a simple two-variable experiment 
the treatment variables may be nominal, ordinal, or interval-ratio; the 
dependent variable may also be nominal, ordinal, or interval-ratio. The 
methods to be applied in the analysis of the data of experiments and the 
type of questions which experiments can answer are determined by the 
nature of the variables. Thus, in effect, the nature of the information 
which an experiment can yield, and the analytic methods which communi- 
cate that information to our understanding, depend on the nature of 
the variables. 

To illustrate, consider an experiment in which three types of therapy 
are applied to three groups of depressed patients. After a time period, 
observations are made indicating those subjects that show evidence of 
recovery and those that do not. In this experiment both variables are 
nominal. The investigator may calculate the proportions in the two 
groups that show recovery and compare these proportions one with 
another. The magnitude of the differences between proportions is a 
measure of the difference between treatments. No further analysis of 
these data is possible. Consider, on the other hand, an experiment in 
which both the treatment variable and the dependent variable are of the 
interval or ratio type. The treatment variable may consist of five 
equally spaced dosages of a drug, and the dependent variable may be 



sec. 17.4 


Single-factor experiments 


273 


reaction time. Here the investigator may not only compare the mean 
reaction times for each dosage with every other dosage, but he may 
explore the nature of the functional relation between the two variables. 
Reaction time may increase or decrease in a linear fashion with change in 
the treatment variable; it may increase and then decrease; or some other 
type of relation may exist. Here we have considered two experiments. 
In one experiment both variables were nominal; in the other, both were 
either interval or ratio. In many experiments the treatment variables 
may be nominal, and the dependent variable may be of the interval or 
ratio type; or the treatment variable may be ordinal, and the dependent 
variable may be nominal, and so on. The nature of the variables deter- 
mines the method of analysis employed and the nature of the conclusions 
drawn 


17-4 

Single-factor experiments 

Many experiments involve a single-treatment, or classification, variable 
with two or more levels. Let us initially consider experiments in which 
the single factor is a treatment \ariable with k levels, and not a classifica- 
tion variable. Such experiments are of a variety of types, some of which 
are discussed here. First, a group of experimental subjects may be 
divided into k independent groups, using a random method. A different 
treatment may then be applied to each group. One group may be a con- 
trol group, that is, t a group to which no treatment is applied. A mean- 
ingful interpretation of the experiment may require a comparison of 
results obtained under treatment with results obtained in the absence of 
treatment. Comparison. 1 may be made between tieatments and a con- 
trol, between treatments, or both. S(Cond, some single-factor experi- 
ments involve a single group of subjects. Each subject receives all k 
treatments. Repeated observations or measurements are made under 
k conditions, one of which may be a control condition, on the same sub- 
jects. In such an experiment as this the measurements made under the 
k treatments will not be ’'impendent. Positive correlation v r ill usually 
exist between the paired measurements obtained under any two treat- 
ments. These correlations will reduce the magnitude of the error used 
in the comparisons of the separate treatment means. Third , a single- 
factor experiment may consist of groups that are matched on one or more 
variab* *s which are known to be correlated with the dependent variable. 
In an experiment on three different methods of teaching French, IQ may 
be known to be correlated with achievement in French. Three groups of 
subjects, paired or matched subject by subject by IQ, may be used. The 
rationale here is that because IQ is correlated with French achievement, 



*74 


The structure and planning of experiments 


chap. 17 


a correlation between French achievement scores for paired subjects may 
result. The error term would be reduced thereby. Matching serves no 
purpose unless positive correlations between the paired values on the 
dependent variable are obtained. In some practical experimental situa- 
tions the gains made by matching art 4 trivial in relation to the work 
involved. An interesting variant of the matched-group experiment is 
one in which subjects are not matched subject by subject but are matched 
with regard to distribution on one or more control variables. This 
results in the same error reduction as the individual matching of subjects 
(see McNemar, 1962). 

17-5 

Randomization 

In the design of experiments frequent use i« made of randomization 
In general the purpose of randomization is to ensure that extraneous 
variables which are concomitant with the dependent variable, and may be 
correlated with it, will not introduce systematic bias in the experimental 
results. Consider again the experiment in which three methods of teach- 
ing French are compared. If subjects are assigned to the three groups 
using a random method, IQ, which may be correlated with French 
achievement, will not vary in any systematic way from group to group. 
Also, an indefinitely large number of other variables, the specifications 
of many of which may not have entered the purview of the investigator, 
will be rendered powerless to introduce systematic bias. Although 
methods of randomization differ from one type of experiment to another, 
the general purpose of randomization is to protect the validity of the 
experiment by controlling the biasing influence of extraneous variables. 
A quotation from Cochran and Cox (1957) is relevcnt here. 

Randomization is somewhat analogous to insurance, in that it 
is a precaution against disturbances which may or may not occur 
and that may or not be Bcrious if they do occur. It is generally 
advisable to take the trouble to randomize even when it is not 
expected that there will be any serious bias from failure to random- 
ize. The experimenter is thus protected against unusual events 
that upset his expectations. 

In experimental design the term “randomization” refers not to any 
subjective impression of haphazard arrangement but to clearly stated 
operational procedures. These procedures may involve the tossing of 
coins, the drawing of numbered cards from a well-shuffled deck, or the use 
of tables of random numbers. Such tables are composed of series of digits 
from zero to nine. Each digit occurs with approximately equal fre- 
quency, and adjacent digits are independent of each other. Tables of 



sec. 17.6 


Factorial experiments 


275 


random numbers are available in Fisher and Yates (1963), and elsewhere. 
To illustrate, suppose we wish to select a sample of 12 experimental sub- 
jects from a sample of 40 subjects. Identify the subjects by the numbers 
1 to 40. Enter a row, column, or diagonal of two-digit numbers in a 
table of random numbers, and select the first 12 different numbers 
between 1 and 40 as they occur. This procedure will yield a random set. 
of 12 subjects. Other procedures may, of course, be used. 

In a single-factor experiment with k independent groups of «i, 
n 2t ... f n k subjects, an appropriate procedure is to choose n x subjects 
at random for the first group, n 2 at random for the second, and so on. 
This procedure is continued until n k subjects remain for the /rth group. 
In an experiment in which k experimental treatments are administered 
to the same group of n subjects and repeated measurements are made on 
the same subjects, an order effect may exist; that is, the result observed 
may not be independent of the order of treatment for reason of fatigue, 
practice, or some other cause. For example, measures of reaction time 
may be made on the same group of subjects under four different dosages 
of a drug. Here quite clearly an order effect may exist In such an 
experiment as this the order of treatments may be randomized for each 
subject, thereby eliminating any systematic influence of order on the 
treatment means. When there are two treatments ouly, .4 and B, a 
common practice is to use the order -1 B for half the subjects and the order 
BA for the other half, assigning subjects to orders at random. In 
experiments involving matched groups, the members of each matched pair, 
triplet, or quadruplet may be allocated to treatments by a random 
method. 


17.6 

Factorial experiments 

The experiments discussed hitherto in tins chapter have involved a single 
independent variable or factor. Experiments may, however, be designed 
to study simultaneously the effect of two or more independent variables 
For example, the experiment in which three methods of teaching French 
are compared may be extended to include a comparison of spaced versus 
massed learning. Under the spaced conditions, subjects receive short 
periods of intensive instruction separated by time intervals. Under the 
massed learning conditions subjects receive intensive instruction for a 
prolong d time period. The effect on achievement of the six possible 
combinations of three methods of teaching and two learning conditions 
may be investigated, each combination being applied to a different group 
of experimental subjects. Such an experiment is called a factorial 
experiment. Experiments in which the treatments are combinations of 



The structure and planning of experiments 


chap. 17 


276 


levels of two or more factors are said to be factorial. If all possible 
treatment combinations are studied, the experiment is said to be a com- 
plete factorial experiment. In some experiments the factors have two 
levels only. We may speak of a 2 X 2 experiment, a 2 X 2 X 2 experi- 
ment, or a 2 n experiment, where n is the number of factors. 

It is informative to examine in more detail the structural features of a 
factorial experiment. Let us suppose that in our 3X2 experiment on 
teaching French in relation to spaced versus massed learning, four sub- 
jects were used in each of the six groups and the following achievement 
test scores obtained. 


Learning 

Teaching method 









condition 

4 


B 


r 

Spaced 

72 34 

96 55 

78 

25 

29 

20 

16 

24 

29 

41 

Massed 

81 85 

75 

45 

19 

19 

99 62 

36 

26 

41 

46 




Three variables are involved in this experiment. Two of these, 
method and learning condition, are independent variables, and one, 
achievement-test score, is th“ dependent variable. Three variate values 
arc available for each subject — the method used, the learning condition, 
and achievement-test scores. Method and learning condition are 
nominal variables. These data could readily be written in columnar 
fashion as follows: 


Learning 

Teaching 

Test 

Learning 

Teaching 

Test 

condition 

method 

score 

condition 

method 

score 

5 

A 

72 

M 

A 

81 

S 

A 

96 

M 

A 

99 

S 

A 

34 

M 

A 

85 

s 

A 

55 

M 

A 

62 

s 

B 

78 

M 

B 

75 

s 

B 

25 

M 

B 

36 

s 

B 

29 

M 

B 

45 

s 

B 

20 

M 

B 

26 

s 

C 

16 

M 

C 

19 

s 

C 

24 

M 

r 

41 

8 

C 

29 

M 

C 

19 

s 

C 

41 

M 

C 

46 



sec. 17.7 


Other experimental designs 


277 


The purpose of this experiment is to explore the relation between the 
learning condition and test scores and the teaching method and test 
scores. The relations between the six combinations of learning con- 
dition and teaching method may also be studied. An important feature 
of the structure of this experiment is that the two nominal variables, 
learning condition and teaching method, are independent of each other. 
By choosing six groups of equal size, the independence of these two 
variables is assured. If the numbers of cases in the six groups, when 
written in the form of a 2 X 3 contingency table, are proportional to the 
marginal totals, the independence of the two treatment variables will 
also be assured. In the design of factorial experiments the groups should 
be either of equal size or proportional. Departures from equalitv or 
proportionality should be avoided. 

One advantage of the factorial experiment that information is 
obtained about the interaction between factors. For example, in the 
experiment on teaching French one method of teaching may interact 
with a condition of learning and render that combination either better 
or worse than any other combination. The concept ot interaction is dis- 
cussed more explicitly, and in detail, in Sec. 19.5. One disadvantage of 
the factorial experiment is that the number of combinations may become 
quite unwieldy, and, from a practical point of view, the experiment may 
be very difficult to conduct. Also, the meaningful interpretation of the 
interactions may prove difficult. In general, in psychology and edu- 
cation, it is usually advisable to avoid factorial experiments with more 
than a few factors. 

The reader should note that factorial experiments may involve 
repeated measurement on the same subjects. For example, in a 3 X 2 
design, repeated measurements may be made on the same subjects under 
each of the six treatment combinations. The result is an N X 3 X 2 
arrangement of numbers. 

17.7 

Other experimental designs 

Situations occur where a basis exists for the classification of experimental 
subjects into r subgroups or blocks. An experiment may be conducted 
involving k treatments, where the number of subjects used is such that 
N = rk. Subjects in each block may be assigned at random to treat- 
ments, under the restriction that each treatment occur once only in each 
block. Such an experiment is a randomized block experiment . To illus- 
trate, an experiment involves a comparison of four treatments. A sample 
of 24 experimental animals is used, which consists of six sets of four litter 



278 


The structure and planning of experiments 


chap. 17 


males. Each set of litter mates is a block. Treatments are allocated 
to the animals in each block, or vice versa, on a random basis. If we 
designate the four treatments as A , R, C, and />, the allocation of treat- 
ments to subjects may be as follows: 



In previous discussion of order effects in single-factor experiments involv- 
ing repeated measurements on the same subjects, it was suggested that 
the order of treatment be randomized for each subject. This is, in fact, 
a randomized block design, whore the subject becomes the analogue of 
the block, and the order of presentation of treatment conditions A , B , ( \ J) 
is represented by column headings. 

Experiments may be considered in which the number of blocks is 
ecpial to the number of treatments or a multiple thereof; thus N - k 2 
or some multiple of /. ". In such experiments each treatment occurs 
once only in each block and once only over blocks. Thus a particular 
arrangement may be. 

A B D C 

1)A C B 

C I) B A 

B C A 1) 


This is a Latin squaie design. A Latin square is simply an arrangement 
in which each treatment occurs once only in each row and column. 
Tables of Latin squares appear in Fisher and Yates (1963). These 
authors suggest methods of choosing a Latin square at random from sets 
of possible Latin squares. When an experiment requires more than one 
Latin square, they must be chosen independently. 


17.8 

Classification variables 

Hitherto we have discussed experiments in which the independent varia- 
bles are treatment variables. In many investigations in psychology and 





sec. 17.8 


Classification variables 


279 


education the independent variables are not treatment variables but are, 
in fact, classification variables; that is, subjects are classified according 
to a characteristic which was present prior to the conduct of the experi- 
ment and did not result from the manipulations of the investigator. For 
example, normals, neurotics, and psychotics may be compared on flicker- 
fusion rates, or some other variable. Here the independent variable is 
a classification, and not a treatment, variable. No treatment by the 
investigator is involved. Quite clearly in such an experiment it is not 
possible to assign subjects to experimental groups at random. Random- 
ization is not possible, because the attribute which determines member- 
ship in a particular group is not under the control of the investigator. 

Three types of situations in which classification variables are involved 
may be recognized. First , the sample of subjects used may be a random 
sample drawn from a defined population, and the proportions in the 
various classes, or strata, may, within the limits of sampling error, corre- 
spond to the population proportions. The object of the investigation is 
to describe relations between the classification variable and some other 
variable. An investigation of this type is, in effect, not an experiment 
at all, but a correlational study. Second, samples may be drawn for 
comparison at random from two or more subpopulations, but the pro- 
portions m these samples may not correspond to the population pro- 
portions For example, an investigator may choose to compare 50 
normals with 50 psychotics on a specified variable or variables. Clearly 
the proportions in the two samples do not correspond to the proportions 
of normals and psychotics in the population. This, in effect, is a form 
of disproportional stratified sampling. Also we note that certain classes 
which exist in the population at large may be excluded Thus, for 
example, not all nonpsychotics are necessarily normal. This again is, 
in effect, a correlational study, the object of which is simply to identify 
differences between groups Frequently there is a presupposition that 
a knowledge of such differences may ultimately prove useful in the (‘ou- 
st ruction of causal arguments. Third , investigations may be conducted 
involving classification variables in which the attempt is made to control 
the influence of certain variables which in a straightforward correlational 
study would not be controlled. In comparing hospitalized normals with 
hospitalized psychotics, the two groups may bo equated for age, iQ, sex, 
length of hospitalization, socioeconomic status, and other variables which 
might be construed to be correlated with the dependent variable. In 
studies such as this the investigator excludes certain variables from the 
group of causal influences affecting the results ob*erved. Although no 
direct causal argument linking the classification variable and the depend- 
ent variable can be advanced, the influence of certain variables on the 



The structure and planning of experiments 


chap. X7 


a8o 


results observed can be excluded. Such investigations narrow the range 
of possible causal influences. Factorial investigations involving classifi- 
cation variables may perhaps be construed legitimately to be experi- 
ments in that by design they involve the experimental control of certain 
variables which otherwise might be uncontrolled. 

17.9 

Concluding observation 

Discussion in this chapter has touched briefly on a few aspects only of the 
structure and planning of experiments. This subject can be elaborated 
at great length and complexity. For a more comprehensive discussion 
the reader is referred to Cochran and Cox (1957), Cox (1961), Finney 
(1960), and Winer (1962). 

EXERCISES 

1 Distinguish, with examples, between (a) an independent and a depend- 
ent variable and ( 6 ) a treatment and a classification variable. 

2 Discuss the rationale underlying the use of matched samples in the 
planning of experiments. 

3 Why does randomization eliminate systematic bias in experimental 
results? 

4 Outline a randomization procedure for allocating 100 experimental 
subjects to five experimental groups. 

5 A group of 10 experimental subjects receives four treatments. De- 
scribe a procedure to eliminate order effects. 

6 What is the rationale underlying the use of equal or proportional 
groups in factorial experiments? 

7 What is meant by a randomized block experiment ? 

8 Give an example of a Latin square containing five rows and five 
columns. 

9 How would you distinguish between an experiment and a correlational 
study? 



Analysis of Variance: 

One-way 

Classification 


1 C Jl .i Its nature and purpose 

The analysis of variance is a method for dividing the variation observed 
in experimental data into different parts, each part assignable to a known 
source, cause, or factor. We may assess the relative magnitude of vari- 
ation resulting from different sources and ascertain whether a particular 
part of the variation is greater than expectation under th*» null hypothesis. 
The analysis of variance is inextricably associated with the design of 
experiments. Obviously, if we are to relate different parts of the vari- 
ation to particular causal circumstances, experiments must be designed 
to permit this to occur in a logically rigorous fashion. 

The partitioning of variance is a common occurrence in statistics. 
The particular body of technology known as the analysis of variance was 
developed by K. A. Fisher and reported by him in 1923. Since that 
time it has found wide application in many areas of experimentation. Its 
early applications were in the field of agriculture. If the variance is 
understood as the square of the standard deviation of a variable X , « x 2 , 
the analysis of variance does not in fact divide this variance into additive 
parts. The method divides the sum of squares Z(A" — X) 2 into additive 
parts. These are used in the application of tests of significance to the 
data. 

In its simplest form the analysis of variance is used to test the signifi- 
cance of the differences between the means of a number of different 
samples. We may wish to test the effects of k treatments. These may 
be different methods of memorizing nonsense syllables, different methods 
of instruction, or different dosages of a drug. A different treatment is 
applied to each of the k samples, each sample being comprised of n mem- 
bers. Members are assigned to treatments at random. The means of 
the k samples are calculated. The null hypothesis is formulated that 
the samples are drawn from populations having the same mean. Assum- 
ing that the treatments applied are having no effect, some variation due 
to sampling fluctuation is expected between means. If the variation 



282 


Analysis of variance : one-way classification 


chap. 18 


cannot reasonably be attributed to sampling error, we reject the null 
hypothesis and accept the alternative hypothesis that the treatments 
applied are having an effect. With only two means, k = 2, this approach 
leads to the same result as that obtained from the t test for the significance 
of the difference between means for independent samples. 

Consider an agricultural experiment undertaken to compare yields 
of four varieties of wheat. Thirty-two experimental plots are prepared, 
and each of the four varieties grown in eight plots. Thus k = 4 and 
n = 8. Assume that appropriate precautions have been exercised to 
randomize uncontrolled factors such as variation in soil fertility from 
plot to plot. The yield for each plot is obtained, and the mean yield 
for each variety on the eight plots calculated. Differences in yield reflect 
themselves in the variation in the four means. If this variation is small 
and can be explained by sampling error, the investigator has i o grounds 
for rejecting the null hypothesis that no difference exists between the 
yields of the four varieties. If the variation between means is not small 
and of such magnitude that it could arise in random sampling in less 
than 1 or f> per cent of cases, then the evidence i^ sufficient to warrant 
rejection of the null hypothesis and acceptance of the alternative hypothe- 
sis that the varieties differ in yield 

In the above agricultural experiment the sampling unit is the plot. 
In psychological experimentation the analogue of the plot is usually 
either a human subject or an experimental animal. In an experiment 
on the relative efficacy of four different methods of memorizing nonsense 
syllables, four groups of subjects may be selected, a different method 
used on each group, and means on a measure of recall obtained for the 
four groups. A comparison of those means provides information on the 
relative efficacy of the different methods, and the analysis of variance 
may he used to decide whether the variation between means is greater 
than that expected from random sampling fluctuation. 

The problem of testing the significance of the differences between a 
number of means results from experiments designed to study the variation 
in a dependent variable with variation in an independent variable. The 
independent variable may be varieties of wheat, methods of memorizing 
nonsense syllables, or different environmental conditions. The depend- 
ent variable may be crop yield, number of nonsense syllables recalled, or 
number of errors made by an animal in running a maze. Experiments 
which employ one independent variable are said to involve one basis of 
classification. The analysis of variance may be used in the analysis of 
data resulting from experiments which involve more than one basis of 
classification. For example, an experiment may be designed to permit 
the study both of varieties of wheat and types of fertilizer on crop yield. 



c. 18.2 


Notation for one-way analysis of variance 


283 


This experiment employs two independent variables. We wish to dis- 
cover how crop yield depends on these two variables. The analysis of 
variance may be used to extract a part of the total variation resulting 
from the differences in varieties of wheat and another part resulting from 
differences in fertilizers, in addition to interaction and error components. 
A further example is a psychological experiment designed to permit the 
.study of the effects of both frec-versus-restricted environment and early- 
versus-late blindness on maze pcrforinan<e in the rat. Here we have 
two independent variables. Each variable has two categories. There 
are four combinations of conditions: free environment and early blind- 
ness, free environment and late blindness, restricted environment and 
early blindness, restricted environment and late blindness. Four groups 
of experimental animals may be used, and one of the four conditions 
applied <0 each group. The analysis of variance may be applied to 
identify parts of the variation in maze performance assignable to the 
different environmental and blindness conditions, and other parts as well. 
Experiments may be designed to permit the Mmultaneous study of any 
number of experimental variables within practical limits. 

Let us proceed by considering in detail the simple case of a one-way- 
classifical ion problem where the analysis of variance provides a composite 
lest of the significance of the difference between a set of means. 


18.2 

Notation for one-way analysis of variance 

Consider an experiment involving k experimental treatments. The treat- 
ments may be different dosages of a drug, different methods of memorizing 
nonsense syllables, or different environmental variations in the rearing 
of experimental animals. Each treatment is applied to a different 
experimental group. Denote the number of members in the k gioups 
by«i, a*, a- The number of members in the jih group is n } . The 

total number of members in all groups combined is 

rii + n* + * * + nk ~ X 

When the groups are of equal size we may write 

N 

rt\ = r ? 2 = ■ ' • = rik = t> = j: 


The data may be represented as follows: 



284 


Analysis of variance: one-way classification 


chap. 18 


Group 1 

Group 2 

Group k 

Xu 

Au 

x lk 

X2I 

X 22 

X 2k 

Xai 

X 3 2 

x< k 



A n ,2 

x nkk 

ni 

y Xu 

ni 

1 x 

n* 

V A\» 


*»i t*i *-i 


Here a system of double subscripts is used. The first subscript 
identifies the member of the group; the second identifies the group. Thus 
X21 represents the measurement for the second member of the first group, 
X*2 represents *he measurement for the third member of the second 
group, and so on. In general, the symbol X l} means the *th member of 
the j\h group. Where the data for each group are tabulated in a separate 
column, the first subscript identifies the row and the second the column. 
The sum of measurements in the k groups are represented by 

ni nt nt 

l x.„ l x t! , . . . , l x,„ 

1=1 1=1 1=1 

We may denote the group means by X 1, X 2 , - • ■ , -Y *. The sym- 
bol X 1 refers to the mean of the first column, X 2 the mean of the second 
column, and X } the mean of thej'th column. The convention is to use 
a dot to indicate the variable subscript over which the summation 
extends. The mean of all the observations taken together may be 
represented by the symbol X , sometimes called the grand mean. In 
a one-way classification the meaning associated with the various symbols 
is quite clear wilhout the use of the dot notation. In discussion of one- 
way classification we shall therefore simplify the notation and represent 
the group means by Xi, X 2 , . . . , X* and the grand mean by X. The 
dot notation is necessary in the more complex applications of the analysis 
of variance, and we shall return to it in Chap. 19 . 

The total variation in the data is represented by the ium of squares 
of deviations of all the observations from the grand mean. The sum of 
squares of deviations of the n\ observations in the first group from the 
grand mean is 

X (x.i - xy 

•-1 



sec. 18.3 


Partitioning the sum of squares 


ass 


and the sum of squares of the n 3 observations in the yth group from the 
grand mean is 

2 - *)» 

» — 1 

For k groups each comprised of n 3 observations the total sum of squares of 
deviations about X is 

l 2 (x tJ - xy 

J«1 1-1 

When the meaning is clearly understood from the context, it is common 
practice to represent this total Mini of squares by 2(X V — X ) 2 , or more 
simply by 2 (AT — X) 2 . 


18.3 

Partitioning the sum of squares 

Simple algebra may be used to demonstrate that the total sum of squares 
may be divided into two additive and independent parts, a within-group 
sum of squares and a hetwecn-group sum of squares. We proceed by 
writing the identity 

(X„ - X) = (X„ - X ■ ) + (. X, - X) 

This identity states that the deviation of a particular score from the 
grand mean is comprised of two parts, a deviation from the mean of the 
group to which the score belong® ( X tJ — A",) and a deviation of the group 
mean from the grand mean (X 3 — X). We squat e this identity and 
sum over the n 3 cases in the 7 th group to obtain 

1 (X t , - Xy = l {X„ - X,y + l (A', - Xy 

t-1 t-1 

+ 2{X, - X) l (X., - X,) 

1-1 

The second term to the rigL‘ requires the suinination of a constant 
(X, — X ) 2 over all n, values of the jth group and may be written 
n,(Xj — Xy. The third term to the right disappears because the sum 
of deviations about the mean X, is zero. We obtain thereby 

2 (X tj - Xy = 2 ( x » - x,)* + n,(X, - xy 

1-1 

This expression says that the sum of the squares of the deviations of the n, 
observations in thejth group from the grand mean X is equal to the sum 



286 


Analysis of variance : one-way classification 


chap. 1 8 


of squares of deviations of the observations from the group mean plus 
nj times the square of the difference between the group mean and the 
grand mean. We now sum over the k groups to obtain 

fe ft, fen, 

l l C x „ - xy- = l l (X„ - X) 2 

3 * 1 * =» 1 J * 1 * *• 1 

+ 1 *,(*, - i ®- 1 

The term to the left is the total sum of squares: the sum of squares of all 
the observations from Ihe grand mean X. The first term to the right is 
the sum of squares within groups: the sum of squares of deviations 
from the respective group means. The second and last term to the right 
is the sum of squares between groups: the sum of squares of deviations of 
the group means from the grand mean, each term (A, — A) 2 being 
weighted by n„ the number of cases in the group. Thus the total sum 
of squares is partitioned into two additive parts, a sum of squares within 
groups, and a sum of squares between groups. These two parts are 
independent. 


18.4 

The variance estimates or mean squares 

Each sum of squares has an associated number of degrees of freedom. 
The total number of observations is rti + + • ■ * + n* = ~ N. 

The total sum of squares has N — l degrees of freedom. One degree of 
freedom is losl by taking deviations about the grand mean. N — 1 of 
these deviations are free to vary. The number of degrees of freedom 
associated with the within-groups sum of squares is 

(«j — 1) + (ri2 — 1) + * • • + ( rik — 1) 

- % n,-k = N - k 

j »- 1 

The number of degrees of freedom for each group is n } — 1. Hence the 
number of degrees of freedom for A* groups is — k, or N — k. The 
number of degrees of freedom associated with the between-groups sum of 
squares is A" — 1 . We have k means, and 1 degree of freedom is lost by 
expressing the group means as deviations from the grand mean. The 
degrees of freedom are additive: 


N - 1 = (AT - k) + (fc - 1) 
total within between 



sec. 18.5 


The meaning of the variance estimates 


287 


The within- and between-groups sums of squares are divided by their 
associated degrees of freedom to obtain a within-groups variance estimate 
sj and a between-groups variance estimate s b 2 . Thus 

k n, 

t i 

S “ 2 = i k ' ‘ r8a 

i n,(X, - xy 

s * ! = , " 1 T- 1 ‘ 8 * 

The sums of squares and degrees of freedom are additive. The variance 
estimates are not additive. The variance estimate is sometimes spoken of 
as the mean square. 


18.5 

The meaning of the variance estimates 

What meaning attaches to the variance estimates s v 2 and s b 2 ? Let us 
assume that the k samples are drawn from populations having the same 
variance. The assumption is that <ii ? = oV = • ■ =- o k 2 -- a 2 . If this 

assumption is tenable, the expected value of ,s It z is a 2 ; that is, 

E(s v 2 ) = <x 2 

Thus s u 2 is an unbiased estimate of the population variance. It is an 
estimate obtained by combining the data for the k samples. It may be 
written in the form 


V7 nk 

A '.) 2 + l x X„ - A \y X (*•» - 

_»=-l _ _ f* J 

Til + A 2 + ’ * * + Tile — k 

18.4 

The reader will recall that in applying the t test to determine the signifi- 
cance of the differences between two means for independent samples, an 
unbiased estimate of the population variance was obtained by combining 
the sums of squares about the means of the two samples and dividing 
this by the total number of degrees of freedom. The within-group 
variance 8 W 2 is an estimate of precisely the same type. It is obtained 
by adding together the sums of squares about the k sample means and 
dividing this by the total number of degrees of freedom. The variance 
estimate used in the t test is the particular case of 8 W 2 which occurs when 
fc = 2. 




288 


Analysis of variance: one-way classification 


chap. 1 8 


The expected value of ,% 2 may be shown to be 

k / k 

l ^ ( N - l n,'/N 

where and n are population means. Under the null hypothesis 
M’ =• M2 = ' * ' - M* = m 

and the second term to the right of the above expression is equal to zero. 
Hence under the null hypothesis both s u 2 and ,% 2 are estimates of the 
population variance a 2 . 

That Sb 2 is an estimate of a 2 under the null hypothesis may be 
illustrated by considering the particular situation where 

rii ■= n 2 = • ■ ' =r ik - n 

The between-group variance estimate may then be written as 

n l (X, - xy 
k - 1 

This is n times the variance of the k means, or The error variance 

of the sampling distribution of the arithmetic mean for samples of size n 
is given by a 5 2 — a 2 /n. Hence ncr 2 2 - a 2 . The quantity n$i 2 is an esti- 
mate of Tiers 2 , hence also of <r 2 . Thus s b 2 is an estimate of a 2 . 

Where the null hypoth 'sis is false and the means of the populations 
from which the k samples aie drawn differ one from another, the second 
term to the right of the expression for E{<% 2 ) is not equal to zero. It is 
a measure of the variation of the separate population means ^ from the 
grand mean m« 

To test the hypothesis Ho'm ■= M 2 = ' ' ' = m*> consider the ratio 
8b 2 /s w 2 . This is an F ratio. Under the null hypothesis the expected value 
of this ratio is unity since E(s b 2 ) = E(s u 2 ) = r 2 . If the population means 
differ from each other, E(s b 2 /$ u 2 ) will be greater than unity. If s b 2 /s w 2 is 
found to be significantly greater than unity, this may be construed to be 
evidence for the rejection of the null hypothesis and for the acceptance 
of the alternative hypothesis that differences exist between the population 
means. The significance of the F ratio s b 2 /s w 2 may be assessed with 
reference to the table of F (Table D of the Appendix) with k — 1 degrees 
of freedom associated with the numerator and N — k degrees of freedom 
associated with the denominator. 




sec. 18.6 


Computation formulas 


28g 


1 8.6 

Computation formulas 

The ealculalion of the required sums of squares may he simplified by the 
u^e of computation formulas. To simplify the notation, denote the sum 
of all the observations in the j\h group by T } . Thus 


X* 




= T, 


Denote the sum of all observations in the k groups l>y T. Thus 



j-i t-i 


The computation formulas are readily obtained. The formula for the 
total sum of squares is 


v ^ (A-., - .5)- - v l x„ 


J 1 »-l 


Ji2 

N 


18.6 


Thus we find the sum of squares of all observations and subtract T 2 /N. 
The withm-groups sum of squares is 


2 x 


A';) : 


- 1 v 

J - It 1 


V 2 

IJ 


1 ( 7 ) 


18.7 


The quantity T 2 /n } is the square of the sum of the 7th group divided by 
the number of cases in that group. These values are calculated and 
summed over the l groups. The bet wren-groups sum of square* is 


k 

V rij(Xj - xy 

J l 



188 


The above formulas are generally applicable to groups of unequal or 
equal size. In the particular case where the groups are of equal size and 
rii = n 2 = • * * = rik = n, the wit bin-groups sum of squares may be 
written as 


k 



18.9 



Table 18.1 

Analysis of variance: one-way classification summary of formulas 


1 « 

-'P' i’ll ^ 


£■>• 1 H *!>' 1 1 

1 ' -C'" 1 " 


lV -* 0 “ 1 I 


~ «t> iV 1 1 

■5 l 7 l’j’ 




• 4 s P' 1 * 1 'i 


a 

h-j 

cr 

c? 

a 

a 

c 

o 

<U 

cr 

on 

o 

w 

■3 

C5 

O 

C 

03 

+-» 

a 

a 

g 

2 s 

o> 


a 

0 

a 

> 

s 

& 

X 


Computation formulas 
equal groups 


sec. 18.8 


Illustrative example : one-way classification 


2QI 


and the between-groups sum of squares becomes 


k 



n N 


18.10 


18,7 

Summary 

Table 1 S .1 presents in summary form the formulas hitherto discussed. 

In summary, to test the significance of the difference between k 
means using the analysis of vaiianee, the following steps are involved: 

1 Partition the total sum of squares into two compo- 
nents, a wit hin-group-? and a bet ween -groups sum of 
squares tiding the appropriate computation formulas. 

2 Divide these sums of squares by the associated num- 
ber of degrees of freedom to obtain ,v „ 2 and .sy 5 , the 
within- and bet w eon-groups vaiianee estimates. 

3 ('aleulate the F ratio *v and refer this to the table 
of F (Table 1 ) of the Appendix). 

4 If the probability of obtaining the observed F value is 
small, say, less than 05 or . 01 , under the null hypothe- 
sis, reject that hypothesis. 


188 

Illustrative example : one-way classification 

Table 18.2 shows the number ot nonsense s\ llables recalled bv lour groups 
of subjects using four different met hods ot presentation. Met it ions data 
are used here for simplicity of illustration. The sums of squares have 
been calculated using the computation formulas The dal a aie present xl 
in summary form in Table I8.fi The number of groups is 4. The 
number of degrees of freedom associated wi’li the between-groups sum 
of squares ; s k - \ 1 — 1 - .. The number of degrees of freedom 

associated wit li the wit bin-groups sum of squares is 

N - k - 26 - 1 - 22 

The number of degrees of freedom associated wilh the total is 


N - 1 = 26 - 1 = 25 



2Q2 


Analysis of variance : one-way classification 


chap. 1 8 


Table 18.2 

Computation for the analysis of variance; one-way classification 
number of nonsense syllables correctly recalled 
under four methods of presentation 



Method 








I 

II 


III 


IV 



5 

9 

8 


1 




7 

11 

0 


3 




6 

8 

9 


4 




3 

7 

5 


5 




9 

7 

7 


1 




7 


4 


4 




4 


4 






2 







n, 

H 

5 

7 


0 


A’ = 26 

T, 

4 3 

42 

43 


IS 


T = 146 

X, 

5 38 

8 40 

6 

14 

3 

00 

7’ViV = 819.86 

n, 







* k n, 

l 

209 

304 

287 


08 



.-1 







jr -= 1 • = 1 








k 

7y 







v 7\* 


231 13 

352 SO 

264 

14 

54 

00 

) -- = 902.07 

n, 















i 


Sum of squares 


Between 

Within 

Total 

902 07 - 
988 

~ 988 ~ 

819 85 = 82 22 

902 07 = 85 93 

819 85— 168 15 


Table 18.3 

Analysis of variance for data of Table 18.2 

Source of 

Sum 

Degrees of 

Varianee 

variation 

of squares 

freedom 

estimate 

Between 

82 22 

3 

27 41 = .s, 2 

Within 

85 93 

22 

3 91 ** .v,, 2 

Total 

108 15 

25 

F = 7 01 



sec. 18.9 


The analysis of variance with two groups 


293 


The between and within sums of squares are divided by the associated 
degrees of freedom to obtain the variance estimates Sb 2 and s w 2 . 

The F ratio is s* 2 /®** 2 = 27.41/3.91 =■ 7.01. Consulting a table of F 
with df = 3 associated with the numerator and df = 22 with the denomi- 
nator, we find that the value of F required for significance at the .01 
level is 4.82. We may safely conclude that the method of presentation 
affects the number of nonsense syllables recalled. 


18.9 

The analysis of variance with two groups 

With two groups only, the significance of the differences between means 
may be tested using either a t test or the analysis of variance. These pro- 
cedures lead to the same result. Where k = 2 it may be readily shown 
that y/F — t. 

Consider a situation where k = 2 and n 1 = n 2 =■ n. Under these 
circumstances the between-groups variance estimate Sb 2 is 

2 _ n(X\ - X ) 2 + n(S t - A') 2 

,s ‘ '2-1 

For groups of equal size the grand mean X is halfway between the two 
group means A r i and X 2 . Thus (A\ - A) — (X 2 — X) = i(Ai — A 2 ) 
and (Xi — X) 2 = (X 2 — X) 2 = i(A'i - AT. We may therefore write 


** - (X, - X 2 ) 2 


When k = 2 the within-groups variance estimate s„ 2 is the unbiased 
variance estimate s 2 , obtaineu by adding the two sums of squares about 
the means of the t wo samples and dividing hv the total number of degrees 
of freedom (Sec. 18 5). Ilencc 

_ (X, - X *) 2 

* ~ s 2 (2/ n) 

and _ i8.ii 

_ . - A 2 _ ^ 

s V (Un) + (1 /«) 



Thus s/F = t and F --= l 2 . To illustrate, let n i - w 2 = 8. In applying 
the analysis of variance with df = 1 associpted with the numerator and 
df = 14 ai'Viciated with the denominator of the F ratio, an F of 4.60 is 
required for significance at the .05 level. The corr esponding t for 
df = 14 required for significance at the .05 level is •\/4.60 — 2.145. The 



294 


Analysis of variance: one-way classification 


chap. x8 


t test may be considered a particular case of the F test. It is a particular 
case which arises when k - 2. 

In the above discussion we have considered two groups of equal size. 
The result y/F - t is, however, quite general and holds when n\ and n 2 
are unequal. For unequal groups the algebraic development is a bit 
more cumbersome than that given here. The grand mean does not fall 
midway between the two group means. 

18.10 

Assumptions underlying the analysis of variance 

In the mathematical development of the analysis of variance a number of 
assumptions are made. Questions may be raised about the nature of 
these assumpl ions and the extent to which the failure of the data to satisfy 
them leads to the drawing of invalid inferences. 

One assumption is that the distributions of the variables in the popu- 
lations from which the samples are drawn are normal. For large samples 
the normality of the distributions may be tested using a test of goodness 
of fit, although in practice this is rarely done. When the samples aie 
fairly small, it is usually not possible to rigorously demonstrate lack of 
normality in the data. Unless there is reason to suspect a fairly extreme 
departure from normality, it is probable that the conclusions drawn from 
the data using an F test will not be seriously affected. In general, the 
effect of departures from normality is to make the results appear some- 
what more significant than they are. Consequently, where a fairly gross 
departure from normality omits, a somewhat more rigorous level of 
confidence than usual may be employed. For a thorough discussion of 
this problem the reader is referred to Lindquist (lOott). 

A further assumption in the application of the analysis of variance 
is that the variances in the populations from which the samples are drawn 
are equal. This is known as homogeneity of variance. A variety of 
tests of homogeneity of variance may be applied. These are discussed 
in more advanced texts (Johnson, 1949). Moderate departures from 
homogeneity should not seriously affect the inferences draw r n from the 
data. Gross departures from homogeneity may lead to results which 
are seriously in error. Under certain circumstances a transformation 
of the variable, which leads to greater uniformity of variance, may be 
used. Under other circumstances it may be possible to use a nonpara- 
metric procedure. 

A further assumption is that the effects of various factors on the total 
variation are additive, as distinct from, say, multiplicative. The basic 
model underlying the analysis of variance is that a given observation may 



sec. i8.ii 


Comparison of means two at a time following an F test 


2Q5 


be partitioned into independent and additive bits, each bit resulting 
from an identifiable source. In most situations there are no grounds to 
suspect the validity of this model. 

With most sets of real data the assumptions underlying the analysis 
of variance are, at best, only roughly satisfied. The raw data of experi- 
ments frequently do not exhibit the characteristics which the mathe- 
matical models require. One advantage of the analysis of variance is 
that reasonable departures from the assumptions of normality and homo- 
geneity may occur without seriously affecting the validity of the infer- 
ences drawn from the data. 

i8.ii 

Comparison of means two at a time following an 
F test 

Following the application of an F test, a meaningful mieipretation of the 
data may require a comparison of pairs of means. The differences 
between some pairs may he significant, while other differences may not 
he. A number of alternative methods exist for the making of such 
comparisons. 

A distinction is commonly made between a priori comparisons and 
a posteriori or post-mortem comparisons. A priori compari-ons are formu- 
lated prior to, and quite apart from, an inspection of the data. For 
example, given four means, the investigator may decide in advance to 
test X i — X 4. A posteriori comparisons are formulated after, and may 
be suggested by, inspection of the data. From a logical viewpoint 
a priori comparisons may be applied whether or not the F test has led 
to the rejection of the null hypothesis. A posteriori tests are only 
applied following a significant F test. 

When more than one comparison made, a distinction is drawn 
between orthogonal and nonorthogonal comparisons. Orthogonal com- 
parisons are independent of each other, it is convenient *o conceptualize 
a comparison as a weighted sum of means. Thus for a set of four means 
the difference A r | — A 2 is a weighted sun* of Aj, A 2 , X Aj and A\, the 
weights being 1 , — 1 , 0, 0. A No the difference A 3 — X 4 is a weighted 
sum, the weights being 0, 0, 1, —1. In general, for equal n’s tw r o com- 
parisons are said to be orthogonal when the sum of products of the 
paired weights is zero. The comparisons Xi — A 2 and X* — X 4 are 
orthogonal because ( 1 ) (0) + ( — 1 ) (0) + (0) (1) + (0) ( — 1 ) = 0. If the 
sum of products of weights is not zero the comparisons are not orthog- 
onal. They are not independent of each other For further discussion 
of orthogonality see Chap. 21. 



2$6 


Analysis of variance : one-way classification 


chap. 1 8 


For selected orthogonal a priori comparisons between pairs of means 
we apply a simple t test, or corresponding F test, since here t 2 = F , to 
the differences between pairs of means, using the within-group variance 
estimate s w 2 . This estimate is based on a larger number of degrees of 
freedom than a variance estimate obtained from two groups only. 
Computationally it is more convenient to use F rather than t, obtained 
by the formula 

F m , 2 = jxi , __(^i -*» ) 2 _ l8 12 

s v , 2 /ni + s w 2 /n 2 s w 2 (nj + n 2 )/nji 2 

To illustrate, in Table 18.2 the means for samples I and II are 5.38 and 
8.40, respectively. The within-group variance, based on 22 degrees 
of freedom, is 3.01. The numbers in the two groups are 8 and 5. The 
F test is then 

(o.38 - 8.40) 2 
3.91(8 + 5) /40 

We consult a table of F with df\ = 1 and df 2 — 22. The values required 
for significance at the .05 and .01 levels are 4.30 and 7 (Irrespectively. 
The observed F is between the .05 and .01 levels. Because of its restric- 
tion to selected a priori orthogonal comparisons, the method described 
above has limited application. 

A variety of methods exist for making selected a posteriori and com- 
plete sets of comparisons. Methods have been developed by Scheffe 
(1953), Duncan (1955, 1957) and others. A useful summary and com- 
parison of these methods is given in Winer (1962). These different 
methods adopt different criteria for the rejection of the null hypothesis. 
The selection of an appropriate criterion in this context is, indeed, a 
thorny problem. For detailed discussion of it the reader is referred to 
Ryan (1959, 1960). 

The method described here is due to Scheffe. This method uses the 
criterion that the probability of rejecting the null hypothesis when it is 
true, a Type I error, should not exceed .01 or .05, for example, for any 
of the comparisons made. This is a very rigorous criterion. To apply 
the method, follow these steps. First, calculate F ratios as in the a priori 
comparison method using formula (18.12). Second, consult a table of F 
and obtain the value of F required for significance at the .05 or .01, or 
any desired level, for df v = k — 1 and df 2 = N — k. Third , calculate 
a quantity F' f which is A- — 1 times the F required for significance at the 
desired significance level; that is, F f = (k — 1 )F. Fourth, compare the 
values of F and F\ For any difference to be significant at the required 
level, F must be greater than or equal to F'. 

The above method may be illustrated with reference to the data of 



sec. x8.ii 


Comparison of means two at a time following an F test 


297 


Table 18.2. The means for the four groups are Xi = 5.38, X 2 = 8.40, 
Xz = 0.14, and X 4 = 3.00; also, n\ = 8, n 2 — 5, n a = 7, and n 4 = 6. 
The values of F obtained using formula (18.12) are as follows: 


Comparison 

F 

I, II 

7.18 

I, III 

.55 

I, iv 

4.97 

II, III 

3.81 

II, IV 

20.34 

III, IV 

8. 18 


The values of F required for significance at the .05 and .01 levels, respec- 
tively, for l//i =• 3 and df 2 - 22 are 3 05 and 4.82. The values of F' 
required for significance at these levels are 9.15 and 14.40. The only 
comparison that achieves significance is that between groups II and IV. 
For this comparison the significance level is less than .01. 

The Scheffe method may be used not only to compare means two 
at a time, but to make any comparison at all. For example, given four 
groups, we may wish to compare the mean for the first two groups com- 
bined, J?i+ 2 — (n 1 X 1 + n 2 Xi)/(ni + n 2 ) } with the mean for the second 
two groups, X 3 + -V 4 “ (n 3 X 3 + n 4 X 4 )/(n 3 4- n 4 ). The F ratio in this 
case is 

p _ _t^i+2 ~ A 3n) 2 

*. 2 / (>*i + ^2) + sj/in* + n 4 ) 

For a discussion of the application of the Scheffe method to any or all 
possible comparisons see Rd wards 

The Scheffe method is more rigorous than other multiple comparison 
methods with regard to Type I error. It will lead to fewer significant 
differences. It is easy to apply. No special problems arise because of 
unequal n’ s. It uses the readily available F tost. The criterion it 
employs in the evaluation of the null hypothesis is simple and readily 
understood. It is not seriously affected by violations of the assumptions 
of normality and homogeneity of variance, unless these are gross. It 
can be used for making any comparison the investigator wishes to make. 

Concern may attach to the fact that the Scheffe procedure is more 
rigorous than other procedures, and will lead to fewer significant results. 
Because *his is so, the investigator may choose to employ a less rigorous 
significance level in using the Scheffe procedure; that is, the .10 level 
may be used instead of the .05 level. This is Scheff^s recommendation 
( 1959 ). Readily available Tables of F do not ordinarily contain critical 
values at the .10 level. Tables showing the .10 critical values are given 
in Fisher and Yates and also in Winer 



298 


Analysis of variance : one-way classification 


chap. 18 


EXERCISES 

1 How many degrees of freedom are associated with the variation in the 
data for (a) a comparison of two means for independent samples, each 
containing 20 cases, (6) a comparison of four means for independent 
samples, each containing 14 cases, (r) a comparison of four means for 
independent samples of size 10, 16, 18, and 1 1, respectively? 

2 The following are measurements obtained for five equal groups of 
subjects: 


Group 

Measurements 



I 

4 

7 

9 

9 

14 

8 60 

11 

5 

0 

12 

12 

7 

8 40 

III 

15 

18 

21 

26 

20 

20 00 

IV 

35 

27 

29 

30 

25 

29 20 

V 

17 

26 

17 

20 

12 

18.40 


Apply the analysis of variance to test the null hypothesis 
Ho'Hi = M2 = = M4 = M5 

3 The following are error scores on a psychomotor test for four groups 
of subjects tested under four experimental conditions 

Group Krror scores X, 


I 

16 

7 

19 

24 

31 



19 40 

II 

24 

6 

15 

25 

32 

24 

29 

22 14 

III 

16 

15 

18 

19 

6 

13 

18 

15 00 

IV 

25 

19 

16 

17 

42 

45 


27 33 


Apply the analysis of variance to test the null hypothesis 
Ho'-m = M2 = M 3 = /14 

4 Apply the analysis of variance to test the significance of the difference 
between means for the following data: 

I II III 


n 

10 

10 

10 

x, 

n 

7 40 

8.30 

10.56 

2 x 

»-i 

649 

755 

1,263 




Exercises for chapter 18 


299 


5 The following are test scores for two groups. 


Group 

Scores 




x, 

1 

12 

20 

:u 

12 14 

10 10 

17 29 

11 

8 

14 

29 

7 14 

6 

13 00 


Calculate ,s„ 2 , .sy*, and V. Test the null hypothesis Ilo-m = V2- 

6 What assumptions underly the analysis of variance? 

7 Compare means two at a time for the data of Exercise 2 above using 

Scheffe’s procedure. Indicate which comparisons arc significant at 

the .01 level. 

8 Compare means two at a time for the data of Exercise 3 above using 

Scheffe’s procedure. Indicate wdnch comparisons are significant at 

the .01 level. 



Analysis of Variance: 

Two-way 

Classification 


19 


i Introduction 


Experiments may be designed to permit the simultaneous investigation 
of two experimental variables. Such experiments involve two bases of 
classification. To illustrate, assume that an investigator wishes to study 
the effects of two methods of presenting nonsense syllables on recall after 
5 min, 1 hr, and 24 hr. One experimental variable is method of pre- 
sentation, the other the interval between presentation and recall. There 
are six combinations of experimental conditions. One method of con- 
ducting such an experiment is to select a group of subjects and allocate 
these at random to the experimental conditions, an equal number being 
assigned to each condition. With, say, 10 subjects allocated to each 
experimental condition, the total number of subjects will be 

2 X 3 X 10 = 60 


The data may be arranged in a table containing two rows and three 
columns. The rows correspond to methods, the columns to time inter- 
vals. The JO observations for each group may be entered in each cell 
of the table. Differences in the means of the rows result from differences 
in recall under the two methods of presentation. Differences in the 
means of the columns result from differences in recall after the three 
time intervals. 

Experiments with two-way classification may be conducted with 
only one sampling unit, and measurement, for each experimental con- 
dition. With one measurement for each experimental condition the 
total sum of squares is partitioned into three parts, a between-rows, a 
between-columns, and an interaction sum of squares. With more than 
one measurement for each experimental condition, the total sum of 
squares is partitioned into four parts, a between-rows, a between-columns, 
an interaction sum of squares, and a within-cells sum of squares. Each 
sum of squares has an associated number of degrees of freedom. By 
dividing the sums of squares by the associated degrees of freedom four 



sec. 19.2 


Notation for two-way analysis of variance 


301 


variance estimates are obtained. These variance estimates are used 
to test the significance of the differences between row means, column 
means, and, with more than one measurement per cell, the interaction 
effect. 


19.2 

Notation for two-way analysis of variance 


Consider an experiment involving R experimental treatments of one 
variable and C experimental treatments of another variable. The num- 
ber of treatment combinations is RC. Let us consider the particular 
case where we have one sampling unit, and one measurement, for each 
of the RC treatment combinations. The total number of measurements 
is RC = N. The data may be represented as follows: 


Row 


1 

0 

3 

■ ■ • c 

mean 

A„ 

Xu 

X 13 

• ■ ■ X IC 

x t 

A',i 

X 22 

A' 2 a 

■ ■ ■ AV 

x. 

A’ai 

Xu 

X n 

• • ■ Xtc 

X, 

A'm 

Xki 

X R} 

• Xrc 

Xr 


Column £ 1 A , X , ••• i c A 

mean 


Double subscripts arc used. The firs* subscript identifies the row; 
the second subscript identifies the column . Thus X& is the measurement 
in the third row and the econd column. Usually X rc denotes the 
measurement in the rth row and rth column, where r ■=■ 1, 2, . . . , R 
and c = 1, 2, . . . , C. A dot notation is used to identify means. The 
symbol X\ refers to the mean of the first row, X 2 the mean of the second 
row, and X r . the mean of the rth row. Similarly, X 1 refers to the mean 
of the first column, X 2 the mean of the second column, and X e the mean 
of the cth column. The grard mean, the mean of all N observations, 
is X... The total sum of squares of deviations about the grand mean 
is given by 

X X (*« - *• )* 

r-1 c-1 

Consider now a situation where we have n sampling units and n 
measurements for each of the RC treatment combinations. The total 




302 


Analysis of variance : two-way classification 


chap, ip 


number of measurements is nRC = N. Where R = 2, C = 3, and 
n = 3, the data may be represented as follows: 

Row 

12 3 mean 


Yi 


Y 2 

mean Jf , £ , Jf 3 



-Yu. 

Yin 

Y m 

1 

X 112 

Y 125 

X 1 32 


Y„, 

Yin 

Y 143 


A'211 

Y221 

Y231 

2 

Y212 

Y222 

A '82 

Column 

Y?n 

Y 22 i 

Xfil 


Triple subscripts are used. The first subscript identifies the row, 
the second the column , and the third the measurement within the cell. 
Thus X 2 n means the measurement for the first individual in the second 
row and the third column. In general, X UI denotes the measurement for 
the tth individual in the rth row and rth column, where ? — 1 , 2 , ... , 
n. Row, column, and cell means are identified by a dot notation. The 
mean of all the observations in the first row is X\ The inqpn of all the 
observations in the rth row is X r . . Similarly, the moan of the first 
column is X lm and of the cth column X c . The mean of all the observa- 
tions in the cell corresponding to the rth row and rth column is X 7C . 
The mean of all nRC observations, the grand moan, is ,V The total 
sum of squares of all observations about t he grand mean is 
r c n 

tit - a' )* 

r ** 1 r “ 1 i—l 

The sum of squares of deviations about the grand mean, both whore n = 1 
and n > 1, is partitioned into additive components. 

19-3 

Partitioning the sum of squares 

With one measurement only for each of the RC treatment combinations, 
the total sum of squares may be partitioned into three additive com- 
ponents, a bet ween-rows, a between-columns, and an interaction sum of 
squares. We proceed by writing the identity 

(X„ - - (X r . ~ X.) + (X ~ X..) 

+ (Xr. ~ ~ X.' + X.) 

This identity states that the deviation of an observation from the grand 
mean may be viewed as composed of three parts, a deviation of the row 




sec. 19.3 


Partitioning the sum of squares 


303 


mean from the grand mean, a deviation of the column mean from the 
grand mean, and a remainder, or residual term, known as an interaction 
term. By squaring both sides of the above identity, an expression is 
obtained containing six terms. This may be summed over R rows and 
C columns. Three of these terms conveniently vanish, because they 
contain a sum of deviations about a mean, which, of course, is zero. The 
resulting total sum of squares may be written as 

X f (X„ - x.y = c l (X r . - X.y 

r-le-1 r *■ 1 

+ R t (*■' ~ *••>* 

e =» 1 

+ n < A '" - Xr - X.' + X ) 

r-Ir-1 

The first term to the right is C times the sum of squares of deviations of 
row means from the grand mean. This is the between-rows sum of 
squares. It describes the variation in row means. The second term is 
R times the sum of squares of deviations of column means from the grand 
mean. This is the bet woen-col umns sum of squares. It describes vari- 
ation in column means. The third term is a residual, or interaction, sum 
of squares. The meaning of the interaction term is discussed in detail 
in Sec. 19.5. 

With n measurements for each of the RC treatment combinations 
the total sum of squares may be partitioned into four additive compo- 
nents. These are a between-rows, a betw een-columns, an interaction, 
and a within-cells sum of squares. In this situation we write the identity 

(Xrci - X ' - (X r . . - X. .) + (X.,. - X ..) 

4 tXrr. ~ X r .. ' X 4 X. . .) + (X ret - X rC ) 

This expression may be squared and summed over rows, columns, and 
within cells. All but four terms vanish, and the resulting total sum of 
squares may be written as 

X X X •• - )* = nC X <*'• - *•••)’ 

r *l e»l »»1 r "l 

+ nR f (X. c . - X..y 

C-* 1 

+ n l £ (Xre. ~ Xr.. - A'.,. + X ..)* 

r «l e-1 ^ 

+ X X X (*« - *-)* 

r-1 *-l t -l 



304 


Analysis of variance: two-way classification 


chap. 19 


The first term to the right is descriptive of the variation of row means, 
the second of column means, and the third of interaction. The fourth 
term is the within-cells sum of squares. It is the sum of squares of the 
deviations of observations from the means of the cells to which they 
belong. 


19.4 

Variance estimates or mean squares 

With a single entry in each cell, n = 1 and RC = N. The number of 
degrees of freedom associated with the total sum of squares is 

RC - 1 = N - 1 

The numbers of degrees of freedom associated with row and column 
sums are R — 1 and C — 1, respectively. The number of degrees of 
freedom associated with the interaction sum of squares is (ft — 1)(C — 1). 
The degrees of freedom are additive, and 

N - 1 = (ft - 1) + (C - 1) f (ft - 1 )(C - 1) 
total row column interaction 

The sums of squares are divided by the associated degrees of freedom 10 
obtain three variance estimates, or mean squares. The between-rows, 
between-columns, and interaction variance estimates are -v 2 , .sv 2 , aud s, 2 , 
respectively. 

With ?i entries in each cell, where n > 1, the total number of obser- 
vations is nftC = N. The number of degrees of freedom associated with 
the total sum \)f squares is nRC — 1 - A r — 1. The numbers of degrees 
of freedom associated with row, column, and interaction sums of squares 
are ft — 1, C — 1, and (ft — 1)(C — 1), respectively. The number of 
degrees of freedom associated with the within-cells sum of squares is 
nRC — RC = RC(n — 1). Because the deviations are taken about the 
cell means, 1 degree of freedom is lost for each cell. In each cell n — 1 
deviations are free to vary. The number of degrees of freedom for RC 
cells, therefore, is RC(n — !)• The degrees of freedom are additive. 
The sums of squares are divided by the associated degrees of freedom to 
obtain the variance estimates, or mean squares. 

Table 19.1 shows in summary form the sum of squares, degrees of 
freedom, and variance estimates for a two-way classification with n 
entries per cell. 

F ratios are formed from the variance estimates and used to test the 
significance of row, column, and, where n > 1, interaction effects. The 
correct procedure here, and the interpretation of the variance estimates, 



sec. 19.5 


The nature of interaction 


305 


depends on the statistical model appropriate for the experiment. Three 
models may be identified: fixed, random, and mixed. The investigator 
must decide which model fits his experiment. This decision determines 
how the variance estimates are used in the application of tests of signifi- 
cance to the data. Before proceeding with a discussion of these models 
(Sec. 19.6), the meaning of the interaction term is discussed. 

195 

The nature of interaction 

The algebraic partitioning of sums of squares in a two-way classification, 
where n > 1, leads to the interaction term 

n tl -x* - x*+ x ) 2 

r - 1 c - 1 

The nature of interaction may be illustrated by example. Consider a 
simple agricultural experiment with two varieties of wheat and two types 
of fertilizer. Assume that one variety of wheat has a higher yield than 
the other. If the yield is uniformly higher regardless of which fertilizer 
is used, then there is no interaction between the two experimental 


Table 19. 1 

Analysis of variance for two-way classification 
with n entries per cell * n > 1 


Variance 


Source 

ISurn of squares 


estimate 

Hows 

nC £ (Sr - X )» 

* 1 

1 1 - 1 

Sr 2 

Columns 

c 

nX V (X r - X )* 

r - 1 

Sc 2 

Interaction 

6=1 

R C 

» E 2 lX,r ■ A 

(X - 1 ){C - \) 

s, 2 

Within cells 

Ilf 

nr in - a 

sj 


r» 1 c — 1 » = 1 



Total 

R C n 

1 1 1 <*«• - * >* 

nRC - 1 





3<>6 


Analysis of variance : two-way classification 


chap, iq 


varieties. If, however, one variety produces a relatively higher yield with 
one type of fertilizer than with the other, then the two variables may 
be said to interact. To illustrate further, assume that we have two 
methods of teaching arithmetic and two teachers. Each teacher uses 
the two methods on separate groups of pupils. The achievement of the 
pupils is measured. If one method of instruction is uniformly superior 
or inferior regardless of which teacher uses it, then there is no interaction 
between methods and teacher. If, however, one teacher obtains better 
results with one method than the other, and the opposite holds for the 
other teacher, then teachers and methods may be said to interact. 

Table 19.2 shows observed cell means for a two-way classification 
with three categories for each of the experimental variables. The 
observed cell entries are means based on an equal number of eases. What 
are the expected cell means on the assumption of zero interaction? This 
situation is somewhat analogous to the calculation of expected values 
for contingency tables. For a contingency table we calculate expected 
cell frequencies. Here we are required to calculate the expected cell 
means on the assumption that the two experimental variables function 
independently. 

Assuming zero interaction, certain constant differences will be main- 
tained between cell means. In Table 19.2 the observed row mean for A 
is 10 points less than the row mean for B. If the interaction were zero, 
we should expect a constant 10-point difference to occur between means 
for A and B under treatments 1, II, and ill. A similar relationship 
would be expected on comparing all other rows and columns of this 
table. Obviously, the observed values in Table 19.2 do not exhibit this 
characteristic. The interaction is not zero. 

Where the interaction is zero, a deviation of a cell mean from the 


Table 19.2 

Comparison of observed cell means and means expected 
under zero interaction 


Observed, X rc . Expected, K(X rc ) 



1 

II 

III 



I 

II 

HI 

A 

2 

12 

16 

10 

A 

4 

16 

10 

B 

6 

20 

34 

20 

B 

14 

26 

20 

C 

64 

76 

40 

60 

a 

54 

66 

60 


24 36 30 


10 


20 


60 

30 


24 


36 


30 30 





sec. 19.6 


Finite, random, fixed, and mixed models 


307 


mean of the row (or column) to which it belongs will be equal to the 
deviation of its column (or row) mean from the grand mean. If A rc . is 
a cell mean and X r .. and A\ f . are its corresponding row and column 
means, then under zero interaction, X rc . — A\.. = X c — X ... Thus 
the expected value of X rc . under zero interaction is given by 

E(X rc .) = X Tm . + A\ c . - A'. 

These expected values have been calculated for the observed data of Table 
19.2 and are shown to the right of the table. On comparing the expected 
values in any two rows or columns, note the constant increment or decre- 
ment. If A rc is an observed and E(X rc .) an expected value, the deviation 
of an observed from an expected value is X rr — A r . — X. e + A..., 
The interaction term in the analysis of variance K n times the sum of 
squares of deviations of the observed cell means from the expected cell 
means. 


19.6 

Finite, random, fixed, and mixed models 

Different authors recommend different procedures for testing row, col- 
umn, and interaction effects in a two-way analysis of variance. Diffi- 
culties associated with the selection of the appropriate procedure are 
resolved by the recognition of a general statistical model underlying the 
analysis of variance This model is referred to here as th z finite model. 
Three particular cases of the finite model may be identified. These are 
the random , fixed, and mixed models. The models appropriate for 
different experiments differ. The investigator must decide which model 
best represents his experiment. The choice of model determines the 
procedure for testing row, column, and interaction effects. The choice 
of model depends on the nature of the variables used as the basis of 
classification in the experimental design. 

The general finite model makes the linearity assumption that a 
deviation of an observation A rct from the population value of the grand 
mean m may be expressed in the form 

Arc* M ~ ' l“ b e " 1 “ (^b)r C “f* €rct 19*^ 

The four quantities to the right are in deviation form. Thus 

Or = Mr.. — M 

a deviation of the population value of the row mean from the grand 
mean m- Similarly, b e = m.c. — m> a deviation of a column mean from 
the grand mean. The interaction term ( ab) rc = ( Mrc . — Mr.. — m c. + m)> 



308 


Analysis of variance : two-way classification 


chap, xp 


and the error term e rC t = X TCt — • Where this model is used to repre- 

sent experimental data the implicit assumption is made that treatment 
effects can meaningfully be partitioned into additive components for each 
sampling unit. Because a T1 b e , ( ab) re , and e rn are in deviation form, they 
sum to zero. The population variances of the four components are 
0o 2 , <rb 2 , <rab 2 ) and cr e 2 . 

The null hypothesis under test, for example, for row effects is 
H r :n\ = H 2 = • * = .• This hypothesis may be stated in the 

form H r :<r a 2 = 0. Similarly, the null hypotheses for column and inter- 
action effects may be stated as H c :a b 2 = 0 and // rr :ffa& 2 = 0. We wish 
to obtain from the experimental data information which will provide 
a valid test of these hypotheses. 

We now consider an actual experiment involving R levels of one 
variable and C levels of another. The R and C levels may be regarded 
as samples drawn at random from two populations of levels comprised of 
R p and C p members, respectively. Thus we conceptualize two popu- 
lations of levels. The levels used in a particular experiment are con- 
strued to be drawn at random from these two populations. R p , R, C py 
and C may take any integral values, provided, of course, that R < R p 
and C < C p . The RC treatment combinations are assigned at random 
to the nRC sampling units or individuals. Under these conditions, and 
given the basic linearity assumption, Wilk and Kenipthorne (195o) have 
shown that the expectations of the mean squares for the general finite 
model are as shown in Table 19.3. 

Thus the mean squares provide estimates of variance components, 
and these are used to test the significance of row, column, and interaction 
effects. How these are used depends on a consideration of three par- 
ticular instances of the general finite model. 

Consider an experiment involving R levels of one variable and C of 

Table 19.3 

Expectation of mean squares for general finite model : two-way 
analysis of variance with n entries in each cell : n > 1 


Mean 

Expectation of 

square 

mean square 

Row, « r * 

— Tltrab* f ttCVa* 

Cp 

Column, s e 1 

a , 1 4- rur a b* 4- nRtrb 1 

R p 

Interaction, s % * 

<r* 4- rurofc* 

Within cells, s w * 




sec. 19.6 


Finite, random, fixed, and mixed models 


309 


another, these being regarded as random samples of levels from popu- 
lations comprised of R p and C p members. We may consider a case 
where R p and C p are very large, so that R p » R and C p » C, where » 
denotes much greater than. Under these circumstances such terms as 
(R p — R)/R p and ( C p — C)/C p approach unity. When this is so we 
have what is referred to as a random model situation. The expectations 
for the random model are oblained by substituting ( R p — R)/R p = 1 
and (C p — C)/C P = 1 in the expectations of the mean squares for the 
general finite model given in Table 19.3. Thus the random model is a 
particular case of the finite model. 

In psychological research, experiments where the random model is 
appropriate are not numerous. Satisfactory examples are not readily 
found. One example is an experiment where each member of a sample 
of R job applicants is assigned a rating by each member of a sample of C 
interviewers. Here both job applicants and interviewers may be viewed 
as samples drawn at random from populations such that R p y> R and 
C p » C. 

In many experiments the R levels of one variable and the C levels of 
the other are not conceptualized as random samples. In agricultural 
experiments where R varieties of wheat and C varieties of fertilizer are 
used, the investigator is usually concerned with the yield of particular 
wheat varieties and with the effect of particular fertilizers on yield. He 
is not concerned with drawing inferences about hypothetical populations 
of wheat and fertilizer varieties. Both variables or factors are fixed. 
Any factor is fixed if the investigator on repeating the experiment would 
use the same levels of it Under the fixed model R - R p and C = C p . 
By substituting (R p — R)/R P *= 0 and (C p — C)/C p - 0 in the expec- 
tations of the mean squares for the finite model given in Table 19.3, the 
expectations for the fixed model are obtained. 

In psychological experiments different methods of learning, environ- 
mental conditions, methods of inducing stress, and the like, are examples 
of fixed factors or variables. In many experiments different levels of 
the experimental variable are introduced, e.g , levels of illumination, time 
intervals, size of brain lesion, and dosages of a drug. While the levels 
may be thought to constitu^ a representative set and interpolation 
between levels may be possible, such variables are usually regarded as 
fixed. Of course it is possible to conceptualize a study where, for 
example, levels of illumination or dosages of a drug are sampled at 
random from a population of levels or dosages. Ordinarily, however, 
experimen v are not designed in this way. 

In many experiments one basis of classification is a random factor or 
variable and the other is fixed. Measurements may be obtained for a 



3io 


Analysis of variance : two-way classification 


chap. 19 


sample of R individuals for each of C treatments or experimental con- 
ditions. Here one basis of classification is random and the other is fixed. 
This is a mixed model. In the mixed-model situation either R p = R 
and C p » C or R p » R and C p = C. By substituting ( R p — R)/R = 1 
and (C p — C)/C = 0, or vice versa, in the expectations for the finite 
model of Table 19.3, we obtain the expectations for the mixed model. 

Table 19.3 may be used to provide the required expectations for a 
two-way classification where n = 1. Under this circumstance no within- 
cells variance estimate s w 2 is available. The expectations for row, 
column, and interaction effects for the random, fixed, and mixed models 
are obtained by writing n = 1 and substituting the appropriate values 
of (R p - R)/R p and (C p - C)/C p . 

19.7 

Choice-of-error term 

By choice-of-error term is meant the selection of the appropriate variance 
estimate for the denominator of the F ratio in testing row, column, and 
interaction effects. In general, in forming an F ratio, the expectation of 
the variance estimate in the numerator should contain one term more 
than the expectation of the variance estimate in the deifbminator, the 
additional term involving the effect under test. On applying this prin- 
ciple to the expectations of Table 19.3, the following rules may be 
formulated : 

1 Random model: n > 1 The proper error term for testing the inter- 
action effect is s w 2 . F t = s x 2 /s w 2 . The correct error term for testing 
row and column effects is s» 2 . F r = sS/s 2 and F c = s c 2 /s t 2 . 

2 Fixed model: n > 1 The proper error term is s w 2 for interaction, 
row, and column effects. The three F ratios are F % - $, 2 /6i u 2 } 
F r = «r 2 /s«> 2 , and F e = s e 2 /s w 2 . 


3 Mixed model : n > 1 The proper error term for testing the inter- 
action effect is s w 2 . Fi = s x 2 /s w 2 . When R is random and C is fixed, 
the proper error term for testing row effects is s„ 2 . F r = s r 2 /s w 2 . The 
proper term for testing column effects is $, 2 . F c = s c 2 / s % 2 . When R is 
fixed and C is random, the converse procedure applies. F r = s r 2 /s % 2 , 
and F e = s e 2 /s w 2 . 


4 Random model : n = / No s«, 2 is available. The correct error term 
for testing both row and column effects is 8 t 2 . F r = and 

F c - « W. 



sec. 19.8 


Pooling sums of squares: n > 1 


311 


5 Fixed model: n = / The point of view may be adopted that no 
test of either row of columns effects can be made. This point of 
view requires some modification. The ratio a, 2 /#, 2 is an estimate of 
(o- € 2 -f CV 0 2 )/(<r e 2 + (Tab 2 ) and will, where a a i 2 > 0, be an underestimate 
of (<r e 2 + C<j a 2 )/(Te 2 . This means that if a significant result is obtained, 
the investigator knows a fortiori that the effect tested is significant. 
If the result is not significant, the probability of accepting the null 
hypothesis, Hq.o-J — 0, when it is false, may be high. Thus in the 
absence of significance no conclusions should be drawn from the data. 


6 Mixed model: n - 1 When R is random and C is fixed, the situa- 
tion pertaining to the testing of row effects is as described above for 
the fixed model, n - 1. The proper error term for the column effect 
is *, s . F r - J» f 2 /#i 2 . When V is random and R is fixed, the argument 
relating to the fixed model, // — 1 , again applies. The proper error 
term for the row effect is n, 2 . V r — * r 2 /# t 2 . 


The above rules, excluding the modification 9 of rules 5 and 0 above, 
can be very simply obtained by using the following schema for the 
proper choice-of-error term. 

How 




(\ 


* s 2 4- 8 2 

w I f w «M 


Column 


ii P R P 

Interaction 


For the random model, (C p — C)/C p ~ 1 and C'C r - 0. The proper 
error term for row and column effects is Similarly, the proper error 
term for the fixed and mixed models may.be obtained. When n i, 
all terms containing vanish. For the random model, r t 2 becomes the 
correct error term for row ard column effects. For the fixed model, no 
tests are possible. When rows are random and the columns are fixed, 
the column effect may be tested, but not the row. 


19.8 

Pooling sums of squares : n > 1 

Under certain circumstances the within-cells and interaction sums of 
squares may be added together and divided by the combined degrees of 
freedom to obtain an estimate of variance based on a larger number of 



312 


Analysis of variance: two-way classification 


chap. 19 


degrees of freedom. Caul ion should be exercised in applying this 
procedure. 

For the fixed model, the within-eells variance estimate is the proper 
error term for testing interaction, row, and column effects. For the 
random model, the interaction variance estimate is the proper error term 
for testing row and column effects. These procedures are always correct. 
For both models, when the interaction i* quite clearly not significant, the 
within-eells and interaction sums of squares may be pooled to obtain a 
variance estimate for the denominator of the F ratio based on a larger 
number of degrees of freedom. Of course, when row and column effects 
are clearly significant, when tested without pooling, the pooling pro- 
cedure is unnecessary. 

When doubt exists as to the significance of the interaction, the 
investigator may or may not choose to pool the sums of squares. If the 
interaction effect in fact exists, a ub 2 being greater than zero, and terms 
are pooled, the pooling may be said to be erroneous. 

For the fixed model, erroneous pooling will increase the size of the 
error term. For the random model, erroneous pooling will decrease the 
size of the error term. In both instances the number of degrees of 
degrees of freedom is increased. Erroneous pooling will^for the fixed 
model usually lead to too few significant results and for the random 
model to too many significant results. 

For the mixed model, when rows are random and columns are fixed, 
pooling may be applied with nonsignificant interaction. In this situation 
erroneous pooling will tend to make the error term too large for testing 
row effects and too small for testing column effects, leading to too few 
significant effects for rows and too many for columns. 

An understanding of the consequences of pooling sums of squares for 
fixed, mixed, and random models, when interaction does exist, that is, 
when (Tab 2 > 0, may be obtained by examination of the expectations of 
the variance estimates given in Table 19.3. Quite clearly, for the fixed 
model, when <r„ ft 2 > 0, combining interaction and within cells will lead 
to an error term whose expectation is greater than o> 2 . Consequently, 
too few significant results will be obtained. 

In general, it is probably advisable not to pool unless the investigator 
is quite confident that the interaction is not significant. For a detailed 
discussion of this rather troublesome problem, see Binder (1955). 

19.9 

Computation formulas for sums of squares 

Computation formulas are used to calculate the required sums of squares. 
A simplified notation is used. Denote the sum of all observations in the 



sec. 19.9 


Computation formulas for sums of squares 


313 


rth row by T r ., the sum of all observations in the cth column by T. c , the 
sum of all observations in the cell corresponding to the rth row and cth 
column by T ret and the sum of all N observations by T. 

With one entry in each cell, the computation formulas for sums of 
squares are as follows: 

Rows 


if T 

C L ' N 


r-1 

Columns 
c 


! y T , _ II 

R 4 e N 


c-1 

Interaction 
r c 




r= 1 c® 1 

Total 

R C 


r*l 


c-1 


l l - 


r — 1 c — 1 


2*2 

~N 


19-3 


19.4 


19.S 


19.6 


The interaction sum of squares may be obtained by adding row and 
column sums and subtracting this from the total sum of squares. This 
provides no check «. 1 the accuracy of the calculation; consequently it is 
preferable to compute the interaction term directly. 

Computation formulas "or sums of squares with n entries in each 
cell are as follows: 


Rows 

R 


1 V T 2 _ ±1 
nC l r * N 


r- 1 

Columns 

i l t T ' - 

Interaction 
r c 


N 


19.7 


19.8 


iT., + n 

r— 1 c — 1 r-1 c-1 


I9.9 



314 


Analysis of variance : two-way classification 


chap, xp 



19.XO 


19.11 


Here again the interaction sums of squares may be obtained by subtract- 
ing the row, column, and within-cells sums of squares from the total, 
although direct calculation of the interaction term is preferable. 

The reader should note that the analysis of variance for two-way 
classification with a single entry in each cell is a particular case of the 
more general case with more than one entry in each cell. When n = 1, 
formulas for the latter case become the formulas for the former. 


19.10 

Illustrative example of two-way classification : 

n = 1 

Table 19.4 shows hypothetical data for two-way classification with one 
entry per cell. Rows are individuals, and columns are treatments. The 
data are presumed to relate to a random sample of individuals tested 
under different treatment conditions. This is a mixed model. One basis 
of classification, the columns, is fixed. The other basis of classification, 
the rows, is random. 

Applying the appropriate computation formulas, the following sums 
of squares are obtained : 

Rows 


1 f r » _ Ti - 394 >?5 0 _ (I 970 ) 2 

C L T N 4 40" 

r — 1 


1,505.00 


Columns 


if r * -II- !<W56 _ (I 970 )* _ 

R L N ~ 10 ' 40 

C — 1 



sec. 19.10 


Illustrative example of two-way classification: n » 1 


315 


Interaction 
r c _ R 




= 122,984 - ? 9 ^ 350 _ 1 . 045^56 


4- = 16,843.40 


Total 
r r 


£ £ * r C 2 - y = 122,984 - ( -^° )2 = 25 , 961.50 

r = ] c » 1 


40 


Table 1 9.6 summarizes the analysis-of-varianoe data for this example. 
Because this is a mixed model with n = 1 and F r - s//s 2 - .279, no 


Table 19.4 

Data for the analysis of variance with two-way classification : 
n - 1 scores for a sample of subjects tested under 
four different conditions 


Subjot t 

Conditions 





A 

B 

C 

D 

T r 

Xr 

1 

:n 

42 

14 

80 

167 

41 75 

2 

42 

26 

25 

106 

199 

49 75 

3 

84 

21 

19 

83 

207 

51.75 

4 

26 

60 

36 

69 

191 

47 75 

5 

14 

35 

44 

48 

141 

35 25 

6 

16 

80 

28 

76 

200 

50 00 

7 

29 

49 

80 

39 

197 

49.25 

8 

32 

38 

76 

84 

230 

57.50 

9 

45 

65 

15 

91 

216 

54.00 

10 

30 

71 

82 

39 

222 

55 50 

7\ 

349 

487 

419 

715 

T = 

= 1,970 


34 90 

48.70 

41 90 

71 50 

X = 49 25 



C 

394,350 £ T . 
-1 


R C 

1,045,756 l 

r — 1 c — 1 


122,984 


1 





Analysis of variance : two-way classification 


chap. X9 


316 


meaningful test of row effects is possible. The proper error term for 
column effects is s t 2 . The F ratio for column effects is found to be 4.04. 
The F ratios required for significance with 3 and 27 degrees of freedom 
associated with the numerator and denominator, respectively, are 2.96 
at the 5 per cent and 4.60 at the 1 per cent levels. Thus the column 
differences are significant at the f> per cent level but fall short of signifi- 
cance at the 1 per cent level. 


Table 19.5 

Analysis of variance for data of Table 19.4 


Sourer of 

Sum of 

Degrees of 

Variance 

variation 

squares 

freedom 

estimate 

Rows 

1,565 00 

9 

173 89 = s r 2 

Columns 

7,553 10 

3 

2,517 70 - s, a 

Interaction 

16,843 40 

27 

623.83 - s.* 

Total 

25,961 50 



F, = - 4.04 

St 

K ~ s? 

-= .279 


Table 19.6 




Data foi the analysis of variance with two-way classification : 

n > 1 error scores for three strains of rats reared 

under two environmental conditions 


Environ- 

Strain 



ment 

Bright 

Mixed 

Dull 


26 

14 

41 

82 

36 

87 

41 

16 

26 

86 

39 

99 

28 

29 

19 

45 

59 

126 

92 

31 

59 

37 

27 

104 


51 

35 

39 

114 

42 

133 

96 

36 

104 

92 

92 

124 

97 

28 

130 

87 

156 

68 

22 

76 

122 

64 

144 

142 


Restricted 



sec. z9.11 


Illustrative example of two-way classification: n > 1 


317 


19. II 

Illustrative example of two-way classification: 

« s. 1 


n > 1 


Table 19.6 shows data obtained in an animal experiment designed to 
study the effects of two variables on measures of performance of rats in 
a maze test. Three strains of rats were used, bright, mixed, and dull. 
A group from each strain was reared under free and restricted environ- 
mental conditions. Thus there are six groups of experimental animals 
with eight animals in each group. The total N is 48. The data are 
arranged in a 2 X 3 table with eight observations in each of the six ceils. 
The row' means permit a comparison of environments, and the column 
means a comparison of strains. Table 19.7 shows the sums, means, and 
sum of squares of row, column, and cell totals. The sum of squares for 
all the observations is also given. 

Applying the computation formulas, the calculations are as follows: 

Rows 


_l V t 2 _ 'C _ 
nC 4 r N 


f>, 944, 837 
24 


(3,3 43) 2 
48 


- 14,875.52 


Table 19.7 

Computation for data of Table 19.6 


Knvuonment 

Strain 




lotal 






Bright 

Mixed 

Dull 


Free 

T n - 277 

B9 



wi 

£,1 = 34 63 

Ka 




Restricted 

Tn - 441 

Tn 

* 752 

T» = 901 

Tt = 2,094 

X t , = 55 13 

Xu 

= J4 00 

X„ = 112 63 

Jt, - 87 25 

Total 

T, = 718 

1 * 

=1,147 

T , - 1,478 

T = 3,343 

X , = 44 88 

x. 

= 71 69 

X , = 92 38 

X = 69 65 


V Tr * = 5,944,837 

R C 

2 l Trc 1 - 2,137,469 


r-1 c-1 


C 

v Tr 9 « 4,015,617 

e ** 1 

R C n 

2 2 Z *-* - 309 - 851 

r«l f»l t — 1 







318 


Analysis of variance : two-way classification 


chap. 19 


Columns 
r 


L v r * II- 4,015,617 _ (3,343 y- 

iR L Te ~ N 16 " 48 18,150 ‘ 


nR 

C* 1 

Within cells 

R c n 


04 


1 V V m , 2,137,469 


1 1 1 x -' - !> 1 1 T ~ - 309 ' 8f ' 1 - - 


r « 1 c = 1 i = l 

Interaction 

R C 


r — 1 c =* 1 


= 42,667 38 


r*l f*l r-1 t - \ 

2,137,469 _ 5,944,837 _ 4,015,617 (3,343)* 

8 '24 16 ' 48 


= 1,332 04 


Total 


R 

l 

r*l r 


1 

l 


L* 

1 t-1 


J>2 

N 


=- 309,851 


(3,343V 

48 


77,024 98 


The analysis-of-variance table for the^e data is given m Table 19 8. 
The df for rows is It — 1=2 — 1 = 1 for columns C — 1 = 3 — 1 =■ 2, 
for interaction (R — l)(C — 1) = (2 - 1)(3 — 1) = 2, and for within 
cells RC(n — 1) = 2 X 3(8 — 1) = 42. These sum to the total sum of 
squares RCn -1 = 2X3X8-1-47 For these data a fixed model 


Table 19.8 

Analysis of variance for data of Table 19.6 


Source of 

Sum of 

Degrees of 

Variance 

variation 

squares 

freedom 

estimate 

Rows (environments) 

14,875 52 

1 

14,875 52 = «r* 

Columns (strams) 

18,150 04 

2 

9,075 02 = #,* 

Interaction 

1,332 04 

2 

666 02 = «,» 

Within cells 

42,667 38 

42 

1,015 89 = *»* 

Total 

77,024 98 

47 


F - *'* 
~ 7~t 

= .656 F r 

*£t! = 1464 F c 

a 1 

» fL =, 8.93 



sec. 19.12 


Unequal numbers in the subclasses 


319 


is appropriate and s w 2 is the proper error term for testing row, column, 
and interaction effects. For interaction we have 


666 . 02 _ 

t% sj 1,015.89 


.656 


This is less than unity. The expectation on t he basis of the null hypothe- 
sis is unity. The interaction is somewhat loss than we would ordinarily 
expect under the null hypothesis. We may safely conclude that there 
is no significant interaction between the two experimental variables. 
For differences in environments we have 


sS = 14,875.52 
s w ~ ~ "1,015.89 


14.64 


with 1 df associated with the numerator and 12 df w ith the denominator. 
For these df the values required for significance at the 5 and 1 per cent 
levels are 4.07 and 7.27. We conclude that the different environments 
have affected the maze performance of the animals. For strains the 
required ratio is F c = s c 2 /s w 2 - 9,075.02/1,015.89 = 8.93 with 2 df 
associated with the numerator and 42 df with the denominator. Again, 
this difference is significant at well beyond the 1 per cent level, and the 
conclusion is that differences in strain affect maze performance. 


19.12 

Unequal numbers in the subclasses 

Situations arise in educational and psychological research where the 
numbers of observations in the subclasses in a two-way analysis of 
variance are unequal. In animal experimentation in psychology, this 
situation may result from loss by death or accident of a number of animals 
during the conduct of the experiment. For the fixed model, if the cell 
frequencies do not depart significantly from either equality or propor- 
tionality, simple adjustments may be made to the data. Two methods 
will be briefly described: the method of expected equal frequencies and 
the method of expected proportionate frequencies. The treatment given 
here is based on the work of Fei Tsao (1946). 

In applying the method of expected equal frequencies the following 
steps are involved : 

1 Apply a x 2 criterion to determine whether the cell fre- 
quencies depart from equality. Denote the frequency 
in the cell corresponding to the rth row and cth column 
by n re . The expected equal frequency is the average 
average value of n r e t or N/RC. Denote this by n. 



320 


Analysis of variance: two-way classification 


chap, ip 


The required x 2 is 



with RC — 1 degrees of freedom. 

2 If the cell frequencies do not depart significantly from 
equality at, say, the 1 per cent level, apply a simple 
adjustment to the sum and sum of squares for each 
cell by multiplying these values by n/Urb . Thus the 
adjusted cell sum is 



and the adjusted cell sum of squares is 


fire 


X 

n TC w 
1 = 1 


rex' 


This adjustment estimates what the cell sum and sum 
of square?* would be wore there an equal number of 
cases n in each cell. Note that this adjustment does 
not change the cell means or the row and column 
means. 

3 Use the adjusted cell sums and sums of squares to 
obtain row and column totals and the total sum oi 
squares. 

4 Proceed with the analysis of variance in the usual 
way, employing the computation formulas given in 
19 . 9 . 


The method of expected equal frequencies is simple and may be 
usefully applied where the numbers of observations in the cells do not 
differ very much. 

In situations where the numbers of observations in the cells differ, 
but are roughly proportionate to the marginal totals, the method of 
expected proportionate frequencies is appropriate. This method requires 
the following steps: 

i Apply a x 2 criterion to determine whether the cell 
frequencies in the rows and columns depart signifi- 
cantly from proportionality. Denote the observed 
frequency in the cell corresponding to the rth row and 
cth column by n Te and the marginal frequencies for 



sec. 19.12 


Unequal numbers in the subclasses 


321 


rows and columns by n r and n r , respectively. Denote 
the cell frequencies expected on the assumption of 
proportionality by fi rr . The expected frequencies are 
given by 


fire 


n r n e 

N 


The procedure here is identical with that used in calcu- 
lating expected cell frequencies for a contingency table 
given the restrictions of the marginal totals. The x 2 
criterion is 

.R C 

I = £ y (n« - fire ) 2 

r-1 r = 1 WrC 


with (/? — 1)(C — I ) degrees of freedom. 

2 If the cell frequencies do not depart significantly from 
proportionality, the sum and sum of squares for each 
cell are adjusted by multiplying them by n TC /n TC . The 
adjusted cell sum is then 


n„ 

"r, V 

n„ L 

t = 1 


.Y 


rn 


and the adjusted cell sum of squares is 


n.e 


— y x 

n rc L, 

1 ■* 1 


ret 


2 


This adjustment provides estimates of what the cell 
sums and v um« of square* would be were the numbers 
in each cell proportional to the marginal totals. 

3 The required sums of squares for the analysis of 
variance are obtained, using the adjusted values, by 
applying the following formulas: 


Rows 

1. «-■)-}' 


Columns 



19.12 


T 2 

N 


19.13 



322 


Analysis of variance : two-way classification 


chap. 19 


Within cells 

R C j fie. 


UfeW-U(S) 


I9.14 


Interaction 


iim-m-m-w 


19.15 


Total 




19.16 


All T s relate to adjusted values. The above formulas 
differ from those previously given in 19.£ only in that 
they make allowance for the fact that the numbers of 
cases in the subclasses are unequal. 

4 Proceed with the analysis of variance in the usual way. 

In the above procedure the within-cells sum of squares is based on the 
adjusted values. Arguments may be advanced for using the unadjusted 
values in calculating the within-cells sum of squares. For comment on 
this point see Gourlay (1955). 

Both the methods of expected equal and expected proportionate fre- 
quencies are in some degree approximate. Departure from equal n\s in 
the former method and from proportionality of n’l s in the latter method 
will introduce some bias in the F test, the extent of the bias being related 
to the magnitude of the departures. By bias here is meant that the F 
test produces either a larger or smaller proportion of significant F ratios 
than is warranted by the F distribution 

The methods of equal and proportionate frequencies arc applicable 
to a substantial proportion of situations encountered in practice. When 
the frequencies differ markedly from proportionality, other methods may 
be applied. For a discussion of these, see Snedecor (1956) and Kenney 
and Keeping (1954). 

For the random model, bias is introduced in the F test despite the 



Exercises for chapter 19 


323 


proportionality of the numbers in the subclasses. From a practical 
viewpoint this is not an important consideration. Good examples of 
the random model with unequal n’ s are difficult to find in educational 
and psychological research. Of more practical importance is the fact 
that for the mixed model F test bias is introduced when the cell fre- 
quencies are proportional, and experiments involving this model are not 
infrequent. The bias is positive, the F test producing a larger proportion 
of significant F ratios than the F distribution warrants. For a discussion 
of this problem the reader is directed to Gourlay (1955). 

In general, because of the complications associated with unequal 
frequencies, it is advisable, whenever possible, to design experiments with 
an equal number of cases in the subclasses, although for the fixed model 
proportionate numbers of cases in the subclasses will introduce no 
bias. The investigator will thereby avoid a number of inconvenient 
complexities. 


1913 

Higher-order classification 

This chapter has concerned itself w r ith the analysis of variance for experi- 
ments with two bases of classification. Experiments may be designed 
with more than two bases of classification wdth either one or more than 
one observation per cell. A common design with three bases of classifi- 
cation occurs where observations are made on every individual in a 
sample under RC different treatment conditions. A consideration of 
higher-order classification is beyond the scope of this book. For a dis- 
cussion of this topic the reader is referred to Walker and Lev (1953) and 
to McNeinar (1902). On choice of proper error term for higher-order 
classification an examination if Wilk and Kempt! orne (1955) will prove 
helpful. 


EXERCISES 


i In an experiment involving double classification with 10 observations 
in each cell, the following cell and marginal means were obtained: 

9 6 
9 9 


(\ C 4 


8 3 

3 2 

17 4 

12 5 

4 6 

12 6 


10 4 


3.9 


15 0 9 8 




324 


Analysis of variance : two-way classification 


chap. 19 


Compute (a) the cell means expected under zero interaction and (b) 
the interaction sum of squares. 


2 The following are measurements made on a sample of 12 subjects 
under three experimental conditions: 


Subject 

Condition 



c, 

Ci 

r, 

1 

8 

7 

15 

2 

19 

14 

20 

3 

7 

9 

6 

4 

23 

20 

18 

5 

14 

2b 

12 

6 

6 

14 

15 

7 

5 

9 

20 

8 

22 

25 

20 

9 

11 

15 

16 

10 

4 

12 

8 

11 

13 

18 

20 

12 

8 

6 

28 

Tr 

" 140 

175 

198 


11 67 

14 58 

16 50 


Obtain the sums of squares and the variance estimates. Test the 
column means on the assumption that experimental condition is a 
fixed variable. 


3 The following are data for a double-classification experiment involving 
two fixed variables: 


Ci C 2 C, 


29 

31 

23 

62 

17 

32 

26 

50 

31 

60 

18 

49 

42 

25 

18 

20 

50 

58 

17 

62 

35 

83 

17 

28 

27 

62 

50 

42 

14 

58 

50 

29 

62 

19 

49 

62 


Apply the analysis of variance to test the significance of row, column, 
and interaction effects. 




Exercises for chapter 19 


325 


4 The following are data with unequal numbers in the subclasses 

C, C 2 



8 9 20 

6 116 

Jtl 

5 16 11 

2 4 


23 4 2 

1 3 


8 20 

11 15 6 

It* 

14 16 

12 18 3 


12 15 

6 2 


Apply the analysis of variance to test row, column, and interaction 
effect . on the assumption that the two experimental variables are fixed. 

5 Compare means two at a time for the data of Exercise 2 above using 
Scheffe's procedure described in Chap. 18. Indicate which compari- 
sons are significant at the 05 level. 




Analysis of Covariance 


20 


•i Introduction 


One object of experimental design is to ensure that the results observed 
may be attributed within limits of error to the treatment variable and 
to no other causal circumstance. For example, the assignment of sub- 
jects to groups at random and the matching of subjects are experimental 
procedures the purpose of which is to ensure freedom from bias. Situ- 
ations arise, however, where one or more variables are uncontrolled 
because of practical limitations associated with the conduct of the 
experiment. A statistical, rather than an experimental, method may 
be used to “control” or “adjust for” the effects of one or more uncon- 
trolled variables, and permit, thereby, a valid evaluation of the outcome 
of the experiment. The analysis of covariance is such a method. 

To illustrate, an investigator may wish to compare three different 
methods of learning French. Each method is applied to a different 
group of subjects. Mean scores on a test of French achievement, follow- 
ing a period of instruction, are obtained for the three groups. If subjects 
had been assigned to the three groups at random, the means for the 
three groups could be directly compared. Practical considerations may, 
however, prevent the assignment of subjects to groups at random. The 
investigator may be required to use three existing classes of pupils. 
These classes may differ in intelligence, and intelligence may be corre- 
lated with French achievement. Thus the investigator does not know 
the extent to which the differences in French achievement result from 
the different methods of instruction or from differences in the intelligence 
of the three classes. Intelligence in this situation is an uncontrolled 
variable. If measures of intelligence are available, the analysis of 
covariance may be used to compare the differences in French achieve- 
ment between classes, with the influence of intelligence, as it were, 
statistically controlled. 

Consider another example. The effect of two drugs on motor per- 
formance is under study. Two groups of subjects are tested in an initial 



sec. 20.2 


Notation 


327 


predrug condition and under the influence of one of the drugs. The 
initial level of motor performance for the two groups may be different. 
Initial level is an uncontrolled variable. Part of the differences in motor 
performance under the drug condition may be due to differences in 
initial level. The analysis of covariance may be used to remove the 
bias introduced by differences in initial level and permit the making of 
unbiased comparisons between drug effects. The analysis of covariance 
is quite commonly used in drug studies in just this way. 

In psychology and education primary interest in the analysis of 
covariance rests in its use as a procedure for the statistical control of an 
uncontrolled variable. It may, however, serve other purposes, such as 
testing the homogeneity of a set of regression coefficients and related 
hypotheses. 

In applications of the analysis of covariance the influence of the 
uncontrolled variable, sometimes called the covariafe or the concomitant 
variable , is usually removed by a simple linear regression method, and 
the residual sums of squares are used to provide variance estimates 
which in turn are used to make tests of significance. The reader is 
advised at this time to review' his knowledge of simple linear regression. 

20.2 

Notation 

An application of a simple analysis of covariance requires paired observa- 
tions on A groups of experimental subjects. The number of pairs of 
observations in the A groups is denoted by nun* . . . , k*. The paired 
observations are assumed to be paired samples drawn from k populations. 
The data may be represented as follows 



Group J 

Group 2 

Group k 


rn 

*11 

Yit 

Xu 

Y lk 

AG* 


Yu 

X»i 

r» 

A’ 2* 

Ytk 

X 2 t 



A*., 

Y 

X,t 

Y it 

Xu 


r... 

A'.,, 


X.,t 

Y.>> 

X. t „ 

Means 

f. 

^1 

Yt 

Xt 




In this notation X is the variable under study, the dependent vari- 
able, whereas Y is the uncontrolled variable, or covariate. In the 
examples of the previous section, either X is a measure of French achieve- 


338 


Analysis of covariance 


chap. 20 


ment and Y is a measure of intelligence, or X is a measure of motor 
performance under the influence of a drug and Y is a corresponding 
measure obtained under the initial predrug condition. The analysis 
of covariance enables a comparison of the group means of X adjusted 
for differences in the means of Y. The group means on Y and X arc 
represented by Y\, Y 2 . . . , Yk and Xi y X '2, ...» A*. A dot nota- 
tion P.i, P. 2, and so on, where the dot represents the variable subscript, 
would perhaps be more appropriate. For convenience, however, we 
have chosen to use a notation without the dot. The grand means of 
Y and X are Y and X, respectively. 

In the analysis of covariance, sums of products are considered. The 
sum of products for the observations in the ^th group is denoted by 

l (X tJ - X ,){¥,, - Yj) 

% * 1 

The sum of products for all observations in the k groups, that is, the 
total sum of products, is 

1 l - X)(Y x , - Y) 

3 - 1 t-1 

20.3 

Partitioning a sum of products 

Before proceeding further with the discussion, it i<* useful to observe 
that with paired observations on k groups the total sum of products may 
be partitioned into within-groups and between-groups sums of products 
in a manner analogous to that in which a total sum of squares is par- 
titioned into within-groups and between-groups sums of squares. As in 
the analysis of variance for one-way classification we may write 

(X v - X) = (X tJ - X,) + (X, - X) 

(Y v - ? ) = (Y tJ - Y,) + {?, - Y) 

These are multiplied, summed over the rtj cases in the jth group, and 
summed over k groups. When this is done, two cross-product terms 
to the right vanish in the summation process, and we obtain 

l I (X„ - X)(Y„ - Y) 

J- 1 **1 

- t l (X„ - X,KY„ - ?,) 

j-1 »-l 

+ i n } (X, - X)(f, - ?) 


20.1 



sec. 20.4 


Regression lines 


329 


The term to the left is a total sum of products of deviations about the 
grand means X and F. The first term to the right is the sum of products 
of deviations about the group means. It is the within-groups sum of 
products. The second term to the right is the sum of products of 
deviations between groups. It is the between-groups sum of products. 

20.4 

Regression lines 

With data consisting of paired observations for k groups, a number of 
different regression lines may be identified. The slope of the regression 
line used in predicting X from a knowledge of F, as given in Chap. 8, is 

1 (X„ - X) (Y„ - Y) 

^ = 

l (Y„ - ?)> 

1-1 

This slope is obtained by dividing a sum of products by a sum of squares. 
If we divide the total within-groups and between-groups sums of products 
by the corresponding sums of squares, three regression coefficients are 
obtained. These are the slopes of three different regression lines. 

The first is the total over-all regression line for predicting X from 
a knowledge of F based on all the observations put together. The slope 
of this line is 

1 l (X t] - X)(Y„ - Y) 

- 2 °-3 

l 2 (r« - Y)' 

j-l. -1 

A second regression line is the over-all within-groups regression line. 
In predicting X from a knowledge of F we may consider each of the 
k groups separately. Each group has its own within-group regression 
line with slope b } . Information from these k separate regression lines 
may be pooled to obtain an over-all within-groups regression line whose 
slope is given by 

i l (X l} -X,)(Y t y - F,) 

b v = 20.4 

l l ( Y tj - F,) 2 

,-1.-1 

The numerator of this equation is the within-groups sum of products, and 
the denominator is the within-groups sum of squares for F. It should 



330 


Analysis of covariance 


chap. 20 


be noted here that the slopes of the individual group regression lines, 
61, 62, , bk } are estimates of population parameters fi 1} # 2 , . . . , tf*. 

The pooling process used to obtain b u involves an assumption of homo- 
geneity of slope, that is, that 0 i = = * * * = 0*. This is a basic 

assumption in the analysis of covariance. 

A third regression line may be considered with slope 6& obtained by 
dividing the between-groups sum of products by the bctwcen-groups sum 
of squares for Y. The slope of this line is 

t n,{X, - X)(Y, - Y) 

bb — *0.5 

1 n,(Y, ~ YV 

Of particular interest m the analysis of covariance is tne within- 
groups regression equation This equation has the form 

X\\ = b w (Y tJ - Y } ) + Xj 20.6 

where b u is the within-groups 1 egression slope As will be shown, the 
analysis of covariance makes use of the sum of squares of residuals of A r 
about this regression line. 

20.5 

Adjusting the sum of squares of 

Given measurements on Y and X for k groups the total sum of squares 
for both Y and X may be partitioned into within-groups and between- 
groups sums of squares, using the analysis of variance methods described 
in Chap. 18. Also the total sum of products may be partitioned into 
within-groups and between-groups sums of products, as described in 
Sec. 20.3. How may the sum of squares on X be adjusted to allow for, 
or to remove, the influence of the variation of the uncontrolled variable 
F? 

Let us first consider, in general, the problem of calculating a sum 
of squares of residuals of X about the linear regression line used in pre- 
dicting X from a knowledge of Y. The equation for this linear regression 
line may be written 

X[ = b„(Y, — Y) + X 

where X[ is a predicted value, and is the slope of the line. The sum 
of squares of residuals about this line is 

N 

2 



sec. 20.5 


Adjusting the sum of squares of X 


331 


By substituting b xy {Y % — Y) + X for X[ in this sum of squares and using 
simple algebra, it is readily shown that the sum of squares of residuals is 


2 - x 'y = 2 t(X * - *> - mf . - ?)]* 


N 


- 2 (x * - *>* 


[ I (X. - X)(F. - ?)]* 

t — 1 


N 


(Y - f)» 


20.7 


This latter expression says that in order to obtain a sum of squares of 
residuals on X , we subtract from the sum of squares of X a quantity 
which is equal to the square of the sum of products of X and Y divided 
by the sum of squares of Y. This quantity will be either zero or posi- 
tive. Thus in the nonzero case this procedure will always reduce the 
size of the sum of squares. A residual sum of squares obtained in this 
way is sometimes called a reduced sum of squares. 

In the analysis of covariance we proceed by calculating an adjusted 
total sum of squares on X, using a regression line with slope b t based on 
all the observations for the k groups put together. This adjusted total 
sum of squares is given by 

k n, k n, 

2 2 (X„ - x'„y =22 ~ X) - b,(Y„ - f )]* 

7-1 1 j — 1 s — 1 

[ l l (X„ - X)(Y„ - Y)] 1 



k 

l 

3 - 

20.8 


i (Y>, - f)* 


k Bj 

- 1 1 <X; ~ *>* 

7-1 «-l 


The next step is to calculate an adjusted ivithin-groups sum of squares 
using the within-group regression line with slope b w . This sum of squares 
is given by 

2 2 (*. - w =22 « x '> - 

7-1 »-i j-it-i 

k n, 

- - f,)j* =22 (x " ~ *’ )a 

7-1 1 


yjt 

lf(Y t] -Y,y 


20.9 



332 


Analysis of covariance 


chap. 20 


Thus to calculate an adjusted within-groups sum of squares on X , we 
subtract from the within-groups sum of squares on -X* a quantity which 
is equal to the within-groups sum of products, squared, divided by the 
within-groups sum of squares on Y. The adjusted sum of squares for 
between groups is now obtained by subtracting the adjusted within-groups 
sum of squares on X from the adjusted total sum of squares. 

20.6 

Degrees of freedom and variance estimates 

The numbers of degrees of freedom associated with the unadjusted and 
adjusted sums of squares on X are as follows: 



Unadjusted X 

Adjusted X 

Between 

k - 1 

k - 1 

Within 

N - k 

N — k — 1 

Total 

N - 1 

N - 2 


The number of degrees of freedom associated with the adjusted total 
sum of squares on X is N — 2. This sum of squares consists of squared 
residuals about a linear regression line, and as such has N*— 2, and not 
N — 1, degrees of freedom. Thus one degree of freedom is lost in the 
adjustment process. The number of degrees of freedom associated with 
the adjusted within-groups sum of squares is N — k — 1. The number 
of degrees of freedom associated with the adjusted between-groups sum 
of squares is k — i, and is unchanged because the between-groups regres- 
sion line did not enter into the calculation of the between-groups sum 
of squares, this sum of squares being obtained by subtraction. 

The adjusted sums of squares on X are now divided by their asso- 
ciated degrees of freedom to obtain within-groups and between-groups 
variance estimates s w 2 and Sb 2 . The interpretation of these variance 
estimates is the same as in the analysis of variance, except that the null 
hypothesis under test relates to adjusted treatment means, that is, means 
that are free of the linear effect of the co variate. 

To test the significance of the difference between the adjusted means 
of X , an F ratio Sb 2 /s w 2 is obtained. This ratio is interpreted with 
df = k — 1 associated with the numerator and df = N — k — 1 asso- 
ciated with the denominator. 

20.7 

Computation formulas 

Computation formulas for sums of squares for X and Y arc given in 
Sec. 18.6. To simplify the notation for the computation formulas for 



sec. ao.8 


Summary 


333 


sums of products, denote the sums of all the observations in the jth 
group for X and Y by T Xj and respectively. Thus 

I - r., | v„ , t„ 

t-1 t = 1 

Denote the sums of X and Y for all the observations together in the k 
groups by T x and T v . Thus 

k », k w, 

l l X..-T, ll Y„ - T, 

; 1 t = 1 ; = 1 » = 1 

The sum of products for the jth group may be represented by 


f X„Y„ = T v , 


» = 1 

and the sum of products for all observations in the k groups by 
t 1 * „Y t) = T xv 

t-1 

The computation formula for the total sum of products is 

k 71 J 

y y (A’ v - A')(r„ - f) = ^ 

i=i t^i 

The within-groups sums of products may he obtained by 

2 y (*« - *,)(!'., - r,) = r IV - v 

L* 1-4 L*i 7lj 

J-l 

The bciween-groups sums of products is 
* 


y ^-A )( f,-f)^y 


20.10 


20.11 


20.12 


The above formulae are applicable to groups of unequal or equal size. 
In the particular case where /?i = n 2 *= * * ■ = w we may, of course, 
write the term 

* y Tv, Tv, 

y Tv, Tv, = _ 

A n - 


20.13 


20.8 

Summary 

In summary, to test the significance of the difference between k adjusted 
means on X using the analysis of covariance, the following steps are 
involved : 



334 


Analysis of covariance 


chap. 20 


1 Partition the total sum of squares on both Y and X 
into two components, a within-groups and a between- 
groups sum of squares, using the usual analysis of 
variance formulas. 

2 Partihon the total sum of products into two com- 
ponents, a within-groups and a between-groups sums 
of products. 

3 Calculate an adjusted total sum of squares on X to 
remove the linear effects of the covariate Y. 

4 Calculate an adjusted within-groups sum of squares 
on X using the within-groups regression of X on Y 

5 Calculate an adjusted between-groups Mini of squares 
by subtraction; that is, subtract the adjusted within- 
groups sum of squares from the adjusted rotal sum 
of squares. 

6 Obtain the variance estimates ,s„ 2 and*$& 2 by divid- 
ing the adjusted within-groups sum of square, > on 
X by df = N — A* — 1, and the between-groups by 
df - fc - 1 

7 Test the significance of the adjusted means on X by 
referring F = Sb 2 /s w 2 to a table of F. 

20.9 

Illustrative example : analysis of covariance 

Table 20.1 shows an artificial example illustrating the Analysis of covari- 
ance. Information is available on tw r o variables for k = 3 wit h unequal 
n's. The treatment means are X x = 11.70, X 2 - 8.75, and X 3 = 6.17. 
Although these means differ appreciably, the means for the co variate 
Y also differ, these being Y x - 17.60, Y 2 = 9.25, and Y» = 6.83. If a 
correlation of some magnitude exists between Y and X , we might antici- 
pate that a substantial part of the variation in the A" m^ans will result 
from the differences in the Y means. All terms necessary for the direct 
calculation of sums of squares on X and Y and sums of products are 
given in Table 20. 1. 

Table 20.2 summarizes the analysis of covariance. The adjusted 
total sum of squares for X is 527,33 — 532.00 2 /!, 248.62 = 300.66. The 



sec. 20.9 


Illustrative example: analysis of covariance 


335 


Table 20.1 

Computation for the analysis of covariance : paired 
observations on Y and X for three groups 


T y , T„ 

Y X 

n, n, 

l X 

* — 1 t - 1 

^ *iv> 

Tyf T r * 

n, rij 


Y X 
5 5 

11 0 
12 9 

26 12 

28 15 

24 16 

11 18 
27 20 

12 4 

20 12 


176 117 


Y X 
6 5 

6 7 

7 12 
12 10 


Y X N = 24 

4 7 7 \, = 291 

5 8 f = 12.13 

7 3 7\*/N « 3,528 38 

9 4 k n, 

y y v., 1 = 4,777 


10 10 

6 5 


£ ^ >V = 4,777 
I , ~i t = 1 

1 X' T 1 

) -•= 4,062.27 

*-> n, 

T, = 224 
T, 2 /N *■ 2,000.67 

k v , 

”■ X Z x ”* = 2,61 s 

; — 1 i =» 1 

V 7 1 2 

,7 i Z " 2 ' 20 ^ 57 


17.60 11.70 | 9 25 8 75 

i 

X, 2 3,720 1,651 I 750 704 


41 37 | Lt n, 

6.83 6 17 j T Ty = 3,248 


I 3,097 60 
| 1,368 90 

Sum of squares 

Y 


6S4 50 
612 50 


307 263 


280.17 
228 17 


y - = 2.959.53 

=1 

T ~ = 2,716.00 


Between 

4,062.27 - 3,528.38 

-= 533 89 

2,209 57 

- 2,090 67 

- 118 90 

Within 

4,777.00 - 4.062 27 

- 714 73 

2,618.00 

- 2,209.57 

- 408.43 

Total 

4,777.00 - 3.52K 3S 

« 1 .248 62 

2,618.00 

- 2,090 67 

= 527 33 


Sum of products 





Between 

2,959.53 - 2,716.00 

- 243 53 




Within 

3,248.00 - 2,959.53 

- 288.47 




Total 

3,248.00 - 2,716 00 

- 532.00 "" 





336 


Analysis of covariance 


chap. 20 


adjusted within group sum of squares is 


408.43 


288.47 2 

714.73 


292.00 


The adjusted between-group^ sum of squares is 300.66 — 292.00 = 8.66. 
The variance estimates are s& 2 = 4.33 and $ 1t 2 = 14.60, and 


F = 


4.33^ 

14.60 


= .30 


This ratio is not significant, and, indeed, it falls substantially short of 
the value of F of unity expected under the null hypothesis. Quite 
clearly almost all the variation in the X means can be attributed to the 
influence of the uncontrolled variable Y. 

It is of interest here to calculate directly the adjusted means on X. 
These adjusted values are given by 

r; = b w (Y - ?,) + x, 

In this sample b w = 288.47/714.73 = .404 The three adjusted means 
are 

X" = .404(12.13 - 17.60) + 11.70 = 9.49 
X'i -= .404(12.13 - 9.25) + 8.75 = 9 91 * 

X" = .404(12.13 - 6.83) + 6 17 = 8.31 

These adjusted means vary very little one from another, a fact which 
is dearly reflected in the small F ratio We may safely conclude that 


Table 20.2 

Analysis of covariance for data of Table 20.1 


Source of variation 


Between 

Within 

Total 

Sum of squares: Y 

533 89 

714 73 

1,248 62 

Sum of squares: X 

118 90 

408 43 

527 33 

Sum of products 

243 52 

288 47 

532 00 

Degrees of freedom 

2 

21 

23 

Adjusted sum oi squares: X 

8 66 

292 00 

300 66 

Degrees of freedom for 




adjusted sum of squares 

2 

20 

22 

Variance estimates 

#6 2 = 4 33 

» 14 60 



F = 4 33/14 60 = 30 


p > .05 





sec. 20.10 


Homogeneity of regression coefficients 


337 


the differences between the unadjusted means for X are due largely to 
the effects of Y. 


20.10 

Homogeneity of regression coefficients 

For certain purposes it may be a matter of interest to test the hypothesis 
that the slopes of the regression lines within the k groups are the same; 
that is, 

Hq . = 02 — • ■ • = f3k -= 

This hypothesis is assumed to be true in any application of the analysis 
of covariance. 

To test this hypothesis, we require that the sum of squares for } r 
and X and the sum of products for each of the k treatments be given 
separately. These values are given by 


*> 

l 

(X„ - 

x,y = T av - 

T 2 
1 *1 

H. 

1 = 1 


»^l 


n, 


n, 

ffl O 

l 
« = 1 

OV 

F,)* - £ }V “ 

t-1 

I V) 

Wj 


T T 

2 O', - x>)(x„ - A'j) - v x„y„ - 

i=l i-l 


The next step is to calculate separately for each group an adjusted 
sum of squares on A” in the manner pie\iously dtwnbed. This is done 
by subtracting from the sum of square* for X a quantity equal to the 
square of the sum of products divided by the sum of squares for Y. 
These adjusted sums of squares are summed over the k groups to obtain 


i 

7-1 




= A 


20.14 


For convenience we denote this term by A . This term is a sum of 
squares of residuals about the k individual regression lines. It has asso- 



338 


Analysis of covariance 


chap. 20 


dated with it 


2 ( n J - 2 ) = 2 n > 


2k = N - 2k 


degrees of freedom. The number of degrees of freedom for each regres- 
sion line is n } — 2, and there are k such lines We now obtain an adjusted 
within-groups sum of squares on X for all observations in the k groups 
in the manner described previously in this chapter. Denote this sum 
of squares by B. This sum of squares has N — k — 1 degrees of freedom. 

To test for homogeneity of regression the following F ratio is calcu- 
lated : 


_ eg - A)/(k - 1 ) 
A/{N-2k) 


20.15 


with k — 1 and N — 2k degrees of freedom associated with the numer- 
ator and denominator, respectively. 

The general rationale underlying the above F ratio is that when 
Ho: pi = p 2 = • • * = Pk is true, the sum of squares of residuals about 
the individual group regression lines with slope b } will be the same, within 
the limits of sampling error, as the sum of squares of residuals about a 
single within-groups regression line with slope b w . Under this circum- 
stance B — A will depart from zero only because of tlfb presence of 
sampling error, and the expected value of the F ratio is unity. When 
Ho: Pi = Pt = • • • = Pk is not true, B will tend to be greater than A 
and the expected value of F v ill be greater t han unity. 

Table 20.3 uses the data of Table 20.1 to illustrate a test of homo- 
geneity of regression. Sums of squares on Y and X , sums of products, 


Table 20.3 

Computation for test of homogeneity of regression 
coefficients using data of Table 20.1 


Group 

Sum of squares 

Sum of 
products 

Adjusted 
sum of 
squares for X 

Y 

X 

l 

622 40 

282 10 

281 80 

154 51 

2 

63 50 

91 50 

4 50 

91 19 

3 

26 83 

34 83 

2 17 

34 65 

Total 

714 73 

408 43 

288 47 

280 35 = A 


A = 280.35 B - 408.43 - 288 47*/714.73 - 292.00 
_ (292.0 0 - 280.35)/ (3 - 1) 5.83 _ 

280.35/(24 - 6 ) ' 15.58 ~ 





Exercises for chapter 20 


339 


and adjusted sums of squares for X have been calculated for each of the 
three groups separately. The adjusted sums of squares are summed to 
obtain A . The Y f X y and sum-of-products columns are summed to 
obtain within-groups sums of squares and products. An adjusted within- 
groups sum of squares on X is calculated, and denoted by B . An F ratio 
is calculated and is found to be .37, which is, of course, not significant. 

In this example the regression slopes for the three groups are respec- 
tively bi = .45, b 2 — .07, and 63 =■ .08. Also, b * = .404. Given the 
small samples used in this illustrative example, the differences between 
such coefficients would not prove to be significant. In fact, the differ- 
ences are less than chance expectation. 


20.11 

The extended use of the analysis of covariance 

The analysis of covariance, as described in this chapter, assumes that 
the regression of A" on Y is linear. The method may be extended to 
deal w r ith situations vhere the regression is nonlinear. Also, we have 
considered situations involving only one uncontrolled variable or covari- 
ate. The analysis of covariance may be used with more than one uncon- 
trolled variable. This involves the use of a multiple regression method 
of the type described in Chap. 24. Our description of the analysis of 
covariance applies to single-factor experiments. The method may be 
adapted for use with two-way, and higher-order, factorial experiments. 
For a more comprehensive discussion of the analysis of covariance the 
reader is referred to B. J. Winer 


EXERCISES 

i The following are paired observations for three experimental groups: 



I 


II 


III 



Y 

X 

Y 

X 

V 

X 


2 

7 

8 

15 

15 

30 


5 

6 

12 

24 

16 

35 


7 

9 

15 

25 

20 

32 


9 

15 

18 

19 

24 

38 


10 

12 

19 

31 

30 

40 


6 60 

“ 9 80’ 

14 40 

22 80 

21 00 

35.00 




340 


Analysis of covariance 


chap. 20 


In this example Y is the covariate or concomitant variable. Calcu- 
late the adjusted total, within groups, and between groups sums of 
squares on X , and test the significance of the differences between the 
adjusted means on X using the appropriate F ratio. 

2 For the data of Exercise 1 above calculate: (a) the slope of the over-all 
regression line b t for predicting X from Y ; ( b ) the slope of the over-all 
within-groups regression line b v ; (r) the slope of the between-groups 
regression line bb . 

3 Calculate for Exercise 1 above the adjusted means on X. 

4 Test the homogeneity of the slopes of the three within-groups regres- 
sion lines in Exercise 1 above. 



Trend Analysis 


X. .t Introduction 

In experiments where the treatment, or independent, variable is nominal, 
the analysis of the data cannot be extended beyond an F test applied 
to the group means and the comparison of means either two at a time 
or in subgroups. In sonic experiments the treatment variable is of the 
interval or ratio type. Examples are experiments on the behavioral 
effects of different dosages of a drug, different periods of practice in 
learning a task, or different numbers of reinforcements in conditioning 
or extinction. With such experiments the investigator may extend his 
analysis to an examination of certain characteristics of the shape of the 
relation between the treatment variable and the experimental variable. 
Questions of the following type may be raised. Do the group means 
increase significant ly in a linear fashion with increase in the treatment 
variable? Is a straight line a good fit to the group means, or do signifi- 
cant deviations from linearity exist? Do the group means increase and 
then decrease with increase in the treatment variable in an inverted U 
fashion? Trend analysis as described in this chapter provides answers 
to questions of this kind. It is an application of the analysis of variance. 

In most applications of trend analysis the investigator is usually 
not interested in the development of a precise equation to describe the 
functional relation between the treatment variable and the experimental 
variable, although such an equation could readily be obtained. Usually 
concern is with questions of significance. 

The procedures described in this chapter assume that the levels of 
the treatment variable are equally spaced . Equal spacing is not a neces- 
sary condition for the application of methods of trend analysis. It leads, 
however, to simplification in computation. 



34 2 


Trend analysis 


chap, ax 


21.2 

Linear trend : partitioning the sum of squares 

Consider an experiment in which k equally spaced treatments are admin- 
istered to k groups, composed of hi, n 2 , . . . , n* members. In applying 
an analysis of variance for one-way classification to such data, the total 
sum of squares is partitioned into two parts, a within-groups and a 
between-groups sum of squares. These are used in an F test of the 
hypothesis, /7 0 :mi - g2 = * ■ * = g*. The group means X u X 2} . . . , 
Xk may, however, show some systematic tendency to either increase or 
decrease in a linear fashion with increase or decrease in the treatment 
variable. Consequently, the investigator may choose to concern him- 
self with questions of linear trend. 

A linear regression line may be fitted to the group means. This line 
may be used to prediet the group means from values of the treatment 
variable. The mean for group j is Xj. Denote the corresponding value 
predicted from the regression line by X\ The total sum of squares may 
now be partitioned into three independent parts. We begin by writing 
an identity 

(x v - A) = (Xrj - Xj) + (Xj - x;) 4 (x; - X) n.x 

This identity states that the deviation of a particular score X tJ from the 
grand mean X may be viewed as composed of three parts* (1) a deviation 
of the score X tJ from the mean of the group X 3 to which it belongs; (2) a 
deviation of X 3 from the value predicted from the linear regression line; 
and (3) a deviation of the predicted value X] from the grand mean. 
This identity is squared, summed over the n 3 case.s m the jth group, and 
summed over the k groups. The cross-product terms vanish, and we 
obtain 

l l (X„ -xy=ll ( x ■„ - x ,) 2 

»=1 ; = 1 t»l 

+ l n } (x, - x;y + X n,(x; - xy 2 i.» 

Thus the total sum of squares is partitioned into three parts. The first 
part is a within-groups sum of squares, and has associated with it AT — fc 
degrees of freedom. The second part is a sum of squares of deviations 
of array means from linear regression. If the group means fall exactly 
on a straight line, this sum of squares is zero. It has associated with it 
k — 2 degrees of freedom. The third part is a sum of squares of devi- 
ations of the predicted values, X', from the grand mean. It has asso- 
ciated with it one degree of freedom. This sum of squares is closely 



sec. 21.3 


Computation formulas for linear trend 


343 


related to the slope b xv of the regiession line It may he shown that 

t n,(X' - X)* = Nb zy \> *i .3 

where is the variance of the ti eat merit van ible tor a particular 
experiment, the quantity V&„ 2 exhibits no freedom ol \aruition It may 
be viewed as fixed Variation in the sum of squaifs depend-, theiefore, 
on variation in one quantity only, b T/ , tin* slope of the line Conse- 
quently, the sum of squares has assoc iated with it one degree oi freedom 
The three sums of squares are dmded b\ then associated degrees 
of freedom Three variance estimates or mem squaio- are obt lined 
The-»c are s„ 2 , the wrthm-gioups moan square, s r the mean Mpiaie for 
deviations from the linear regression line and s*’ the lineal regression 
mean square Two F ratios may be obtained Tlu first 1 - Ti s/ s„ 2 , 
with one degre* of freedom associated with the nm critor and \ — A 
degrees of freedom associated with the denominator This piovides a 
test of linear trend It tests whether the slope ot the regression line is 
significantly different fiom zero 'Die second / ratio is / 
with A — 2 degree- of tieedom assoc latcd with the numu it 01 md \ - / 
with the denominatoi This provides a test ot dr par tun - from linearity 
It tests whether the \ariame of deviations from line int\ is -ignifuantiy 
different from zero If / > is sigmfic ant wema\ 1 one ludc that 1 stiaight 
regression line is not a good ht to the set of gioup mean- 


21.3 

Computation formulas for linear trend 

The calculation involved in the anal} sis of data for trend with equally 
spaced treatment levels 111 a} be simplified by using a computation 
variable In the process of calculation the value- of the con put at ion 
variable are assigned to different levels ot the treatment \anable for 
A - 3 to A - 7 the computation variables, denoted bv c 1 , are as follows 
k 


3 

-1 

0 

1 



4 

-3 

- 1 

1 3 



5 

_2 

-1 

0 1 

2 


6 

-5 

-3 

-1 1 

3 

r > 

7 

-3 

-2 

-1 0 

1 

2 3 

Thus for k — 3, is 

-1,0,1 

These are, 

m fact, coefficients for orthog- 


onal polynomials m the linear case, and aie discussed later in this chap- 
ter. Values for A > 7 may be obtained from Table J of the Appendix. 



344 


Trend analysis 


chap. 21 


For the present these coefficients may be viewed as a convenient compu- 
tational device, analogous perhaps to the computation variable used m 
calculating a mean or standard deviation fiom grouped data. 

The computation formulas for the total and within-groups sums of 
squares are the same as for one-way classification, and are given in 
Chap. 18. 

For unequal n's the sum of squares for linear regression is given by 

k k 

* ( £ Cr } T, -T £ n,cJNy 

l n,(x;-X)*= r - - »-4 

>” 1 £ rc/V - ( £ n,r,,y'N 


The deviation sum of squares may be obtained by subtracting the within 
and linear regression sums of squares from the total sum of squares 
For equal n's the sum of squares for linear regiession reduces to 





nZcu 2 


21. 5 


21.4 

Illustrative example of analysis for linear trend 

The computation required for an analysis of variance for linear trend is 
illustrated in Table 21 1 The data are those used previously in Table 
18.2, The total sum of squares is 168 1.1, as before, and the within- 
groups sum is 85.93. The linear regression sum of squares is 

( l Ct,T , -T l n,cJN y 

J = 1 

£ n,c i,* - ( £ njCi,y/N 

_ t—74 M146 X — 4)/26]* 

138 - 16/26 

The deviation sum of squares, obtained by subtraction, is 
168.15 - 85 93 - 19.34 = 62.88 

The F ratio for linear regression is significant at a little better than the 
.05 level. The F ratio for deviations from linear regression is significant 
at better than the .01 level. If the .05 level is accepted as a suitable 



sec. 21.4 


Illustrative example of analysis for linear trend 


345 


Table 21.1 

Computation for linear trend analysis with unequal n’s 
using the data of Table 18.2 



Method 




N - 26 


1 

2 

3 

4 

T - 146 

T*/N - Sl<) 85 

X, 

5 38 

8 40 

6 14 

{ Of) 

k 

n 

Tlj 

8 

5 

7 

6 

l n,c„ =- 4 

1- 1 






k 

r, 

43 

42 

43 

18 

y c„r, - -74 












* 


-3 

-1 

1 

3 

y n,f„* - 1 $8 






j - 1 






i rij 

l 

269 

364 

287 

68 

2 I *«’ - 088 

i « 1 





j -1 .-1 

T V 
n, 

231 13 

352 80 

264 14 

54 00 

it 

V 'A 1 

) — = 902 07 
^ n, 

1 * l 


Table 21.2 

Linear trend analysis for the data of Table 21.1 


Source of 
variation 

Sum of 
squares 

Degrees of 
freedom 

Variance 

tstimate 

Linear 




regression 

19 34 

1 

a* 1 = 19 34 

Deviation 

62 88 

2 

= 3t 44 

Within 

85 93 

22 

«»* = 3 91 

Total 

168 15 

25 



Fi 


19.34 

3.91 


4 95 


F 2 


31.44 

3.91 


8.04 






348 


Trend analysis 


chap. 2i 


nomial regression equation as shown in formula 21.0. Wit h orthogonal 
polynomials the values of X' are obtained by using an orthogonal poly- 
nomial as shown in formula 21.7. Because for equal n } s a = X, we 
may write 

(X' — X) =■ + b 2 C2j + • • ■ + b m c mj 21.8 

By substitution in equation 21.1 we obtain 

(X tJ - X) - (X v - X,) + (X, - X') + b lClJ 

+ b 2 C 2j + ‘ ‘ + b m C m j 2I.Q 

This equation is squared, summed over the n eases in the jth group, the 
n ’ s for the /, groups being considered equal, and summed over the 1 
groups. The cross-product terms vanish, and we obtain 

l l (X„ - xy = i l (X„ - x,y 

j=> 1 t = l ; -1 » = 1 

k k k 

+ (A, - X',y + nhS £ <V + nbS £ r 2 , 2 

>-1 j-i 

k 

4 - * + nh m 2 Y, c ">j 2 2110 

Tims the total sum of squares is partitioned into a within-group sum of 
squares, a sum of squares of deviations of group means from the values 
X' obtained from the orthogonal polynomial regression equation, and a 
series of sums of squares each associated with a regression component, 
linear, quadratic, cubic, quartic, and so on. 

The within-group sum of squares has k(n — 1) degrees of freedom. 
The deviation sum of squares has k — 2 degrees of freedom for a linear 
equation, k — 3 for a quadratic equation, k — 4 for a cubic equation, 
and k — m — \ degrees of freedom in the general ease where m is the 
degree of the polynomial. Each regression sum of squares has one 
degree of freedom. V ratios may be formed to test the deviation and the 
regression components. 


To illustrate the meaning of the various regression components, con- 


aider five experiments, 

A,B, C.D 

,E. 

Each experiment has five equally 

spaced treatments, I, 

II, III, IV, 

V. 

Let the arithmetic 

means for the 

five experiments be as 

follows: 






I 

II 

III 

IV 

V 

A 

20 

25 

30 

35 

40 

B 

20 

25 

30 

25 

20 

C 

20 

28 

34 

38 

40 

D 

20 

25 

30 

25 

30 

E 

20 

25 

20 

25 

20 



sec. 21.9 


Trend analysis using orthogonal polynomials 


349 


For experiment A all means fall exactly on a straight line. A linear 
regression component will he obtained, but 110 higher-order or deviation 
components. For experiment B the means increase and then decrease. 
A quadratic regression component will he obtained, buT no linear com- 
ponent because the means show no over-all tendency to either increase or 
decrease in a linear fashion. For experiment C the means show an over- 
all tendency to increase, but not in a linear fashion. \ decreasing 
increment is observed from one mean to the next Here a linear regres- 
sion component corresponding to the over-all increase 111 means will 
result; a quadratic component will also result, which will reflect, as it 
were, the betiding nature of the relation. Experiment I) illustrates a 
cubic relation A linear component will also he obtained which reflects 
the over-all tendency of the means to increase. A quadratic component 
will reflect the asymmetrical nature of the cubic relation. Experiment A’ 
illustrates a quartic relation. For this experiment the lineal, quadratic, 
and cubic components will be zero. 


21.8 

Computation formulas for trend analysis 
using orthogonal polynomials 


'Fhe computation formulas for the total and wit bin-groups sum of squares 
are the same as for one-way classification, and are given in Chap. 18 . 
For equal n’t s the various regression components are given bv 


and so on 


k 

Tibi 2 ^ Cl 2 

7-1 



( l ruT.y 



k 


( l C ’I T ') 

7-1 


2 


k 



21. II 


21.12 


2I.Q 

Illustrative example of trend analysis 
using orthogonal polynomials 

Table 21.8 shows an illustrative example of the computation required for 
a trend analysis using orthogonal polynomials. In this example n = 8, 
the groups being of equal size, and k =• 4. Table 21.4 shows the cor- 



350 


Trend analysis 


chap. 21 


Table 21.3 

Illustrative example of computation for trend analysis 
using orthogonal polynomials 


Treatment 



I 

II 

III 

IV 



5 

4 

5 

14 



7 

12 

8 

7 

T * 294 


6 

8 

17 

6 

r*/V =• 2,701 13 


3 

7 

20 

12 

V 77 /n = 2,005 75 








9 

7 

14 

15 

S t x '' - *- S24 






J a l | = 1 






it 


7 

6 

8 

19 

y f„r, - 102 






3- 1 






it 


4 

4 

n 

8 

y - 24 


2 

0 

11 

11 


n 

8 

8 

s 

S 

it 

y r,,T, -S6 

T, 

4.3 

57 

102 

92 

x, 

5 38 

7 13 

12 7") 

11 50 

k 






y - 20 

Cl; 

-3 

- 1 

1 

3 


Ci, 

1 

-1 

-1 

1 

it 






y <».* - 4 

C«7 

-1 

f 3 

— 3 

1 

J-l 

n 





k 

t X "' 

260 

455 

1,604 

1,196 

y <•.,* - 20 

i-l 







Sum of squares 


Linear 

192»/(8 X 20) 

= 230 40 

Quadratic 

—24*/ (8 X 4) 

- 18 00 

Cubic 

—86’/ (8 X 20) 

- 46 23 

Deviation 

822 87 - 528.24 - 230 40 



- 18 00 

- 46 23 = 00 

Within 

3,524.00 - 2,995.76 

- 528 24 

Total 

3,524 - 2,701 13 

= 822 87 







sec. 21.10 


Trend analysis with unequal n’s 


351 


responding analysis of variance table. In this example the linear com- 
ponent is significant with p < .01. Neither the quadratic nor the cubic 
regression components are significant. The reader will note that the 
deviation component is zero. This occurs because a cubic equation will 
always fit four points exactly, just as a linear equation will always fit two 
points exactly, and a quadratic equation three points. In this example 
with A’ = 4 the analysis has been carried as far as possible. 


21.10 

Trend analysis with unequal n’s 

If a single trend component only is under study, whether linear, quad- 
ratic, cubic, or of a higher order, t lie n* s may be equal or unequal. If the 
intent of the investigator is to examine only the linear and quadratic 
components and no higher-order components, t lie experiment should be 
designed such that the n’s are either equal or constitute a symmetrical 
set, such as, for example, 4, 10, 1, for A - 3, or 10, 20, 20, 10, for A --= 1. 
The linear and quadratic components are orthogonal when ~njCi } c»j = 0 
For orthogonal components with unequal n’s the linear regression sum of 
squares is given by 2vri i 7 T ,) 2 /25w J c lj *,and the quadratic by c^Tj) 7 /^ n/V. 

For higher-order trend analysis it is advisable to design the experi- 
ment such that the n’s are equal. The reason for this is that the various 
score components, biCi Jy b 2 C 2 J} and the like, in the equation for an orthog- 
onal polynomial, are orthogonal for equal n’s. For unequal n’s the com- 
ponents will not be orthogonal, except under rather special circum- 
stances. The first three components form an orthogonal set when 


Table 21.4 

Analysis of variance using trend analysis for data of Table 21.3 


Source of 
variation 

Sum of 
squares 

Degrees of 
freedom 

Variance 

estimate 

Linear regression 

230 40 

1 

8 h * - 230 40 

Quadratic 




regression 

18 00 

1 

V -= 18 00 

Cubic regression 

46 23 

1 

s c * -- 46 23 

Deviation 

00 

0 


Within 

528 24 

28 

= 18 87 

Total 

822 87 

31 


- TSSf - 12 2 ’ 

F, = 

18.00 „ 
18787 - 95 

. 4 -®? 3 = 2.45 

18.87 



352 


Trend analysis 


chap. 21 


ZnjCijC 2 j = 'Zn ] c\jCzj = = 0. (Nearly for unequal w’s this set of 

eompouents will not ordinarily be orthogonal. 

In praetieal experimental work, situations arise where the n ’ s are 
unequal. If the departures from equality are not gross, methods may 
be used identical to those described in Chap. 19 for making adjustments 
for unequal ri s in the analysis of variance for two-way classification. 

21. 11 

Trend analysis : correlated data 

The methods of trend analysis described above have application to 
experimental data obtained from k independent groups. These methods 
may be extended to correlated data with measurements obtained on a 
group of n subjects under k conditions. Such data are illustrated in 
Table 19. V, Sec. 19.10. The columns represent four conditions and the 
rows 10 subjects Variance estimates are obtained for rows, columns, 
and interaction. The difference between column means is tested using 
F — a, 2 /*, 2 , the denominator being the interaction variance estimate. 

If the differences between the experimental condition^ are equally 
spaced, it is appropriate to analyze the column means for trend using the 
method of orthogonal polynomials The method is essentially the same 
as that used tor k independent groups The one difference is that for k 
independent groups the within-groups variance estimate, s w 2 } is used as 
the error term to test the regression and deviation components, whereas 
for correlated data the interaction sum of squares s r is used in the denom- 
inator of the F ratio. 

Thus the F test for linear regression is F - -% 2 /s x 2 with dfi = 1 and 
(If 7 = (R - 1)(( - 1) - — 1 )(/.* — 1). The test for quadratic regres- 

sion is F — 6‘ fl 2 A, 2 , and so on. The test for the deviation component is 
F = Sd 2 /sr with df x = C — nx — 1, m being the number of regression 
components removed, and df<i = (n — 1 )(k — 1). 

21.12 

Extended applications of trend analysis 

The methods of trend analysis described in this chapter have application 
to experiments involving one-way classification. The methods may be 
applied to two-way, and higher-order, factorial experiments. With a 
two-way factorial experiment of the type described in Chap. 19, both row 
and column means may be analyzed for trend, using orthogonal poly- 
nomials as in the one-way classification case. The various regression 
and deviation components for rows and columns may be tested using 
s w 2 in the denominator of the F ratio. Also in factorial experiments the 
interaction between treatments may be analyzed for trend. In a two-way 



Exercises for chapter 21 


353 


factorial experiment with R rows and C columns the analysis of inter- 
action for trend is a method for studying the response surface which the 
RC cell means define. Do all cell means fall in a plane? Do they form 
a response surface with quadratic characteristics? Questions of this 
general kind, and others, can be answered by the analysis of interaction 
for trend. Such methods are described in detail by Winer (1902). 


EXERCISES 

i Consider the following data: 



I 

II 

III 

n 

20 

20 

20 


7 40 

10 50 

12 65 

n 




X *..■ 

2 , 150 

2,150 

2,760 

t = I 





Calculate (a) the within-group, linear regression, and deviation sums 
of squares, ( b ) the three variance estimates, (r) F ratios ior testing 
the significance of linear regression and the deviations from linear 
regression. 

2 Obtain the slope of the linear regression line for the data of Exercise 1 
above, assuming a unit difference between the levels of the treatment 
variable 

3 What are the coefficients for orthogonal polynomials ci„ c 2; , c 3j for 
k = 6 ? 

4 Consider the following data: 



I 

II 

III 

IV 

V 

n 

10 

10 

10 

10 

10 


5 50 

8 65 

10 43 

4 86 

9 50 

2*.,* 

305 

875 

1050 

305 

905 


Calculate the linear regression, quadraiic regression, cubic regression, 
deviation, and within-groups variance estimates and apply the appro- 
priate tests of significance. 

5 What difficulties will attach to the analysis of experimental data for 
quadratic, cubic, and higher order trend, when the n s in the k groups 
are unequal? 



Selected 

Nonparametric Tests 



.i Introduction 


Many tests of significance involve assumptions about the nature of the 
distributions of the variables in the populations from 'which the samples 
are drawn. The / test and the analysis of variance, for example, assume 
normality of the parent distributions In experimental work situations 
arise w r here either little is known about the population distributions or 
these distributions are known to depart appreciably from t lie normal 
form. In such situations nonparametric tests may be appropriately used. 
Nonparametric tests make few assumptions about the properties of the 
parent distributions. Assumptions about the parent distribution are 
involved in nonparametric tests, but these are usually fewer in number, 
weaker, and easier to satisfy in data situations. Nonparametric tests 
are frequently spoken of as distribution-free tests. The implication is 
that they are free, or independent, of some characteristics of the popula- 
tion distributions. 

The reader will recall the distinction between nominal, ordinal, 
interval, and ratio variables. Nonparametric methods are appropriate 
for nominal and ordinal data; parametric methods for interval and ratio 
data. In practice, nonparametric methods are frequently used witli 
data of this latter type. The data are reduced to a form such that a 
nominal, or ordinal, statistical procedure may be applied to them. An 
important class of nonparametric tests employs only the sign properties 
of the data. All observations above a fixed value, such as the median, 
may be assigned a plus, and all below r , a minus. The original variable is 
replaced by, or transformed to, another variable which takes the sign 
values plus or minus. Another class of nonparametric test employs the 
rank properties of the data. The original observations are replaced by 
the numbers 1, 2, 3, . . . , N Subsequent statistical manipulation and 
inferences are based on ranks. 

Nonparametric statistics when applied to interval and ratio data use 
only part of the information available. It is intuitively obvious that if 



sec. 22.2 


A sign test for two independent samples 


355 


measurements are transformed to variables employing only signs or 
ranks, something is lost in the process. In data where the assumptions 
required for a parametric test are satisfied and both parametric and 
nonparametrie tests may be applied, the nonparametric tests have less 
power. The power of a statistical test is defined as the probability of 
rejecting the null hypothesis when that hypothesis is false. The power 
of a test depends in part on sample size. Two tests, A and B , may be 
compared by considering the relative sample size required to make them 
equally powerful. The relative efficiency of the two tests is given by 
lOO(N a /N b ), where N b is the number of observations required to make 
test B as powerful as test A with N a observations If A is the most 
powerful test available, the quantity ]00{N a /Nh) is called the power 
efficiency of a tost. The power efficiency of many nonparametric tests is 
fairly high tor small samples and decreases with sample size. Such 
comparisons can of course only be made for normal distributions where 
both a parametric and a nonparametric test may be applied Since 
nonparametric tests are used where little is known about the parent 
distribution, the power of the test in most practical situations is 
unknow’n. 

For a comprehensive treatment of nonparametric «ests the reader is 
referred to Siegel (lOoff) and to Tate and (delland (19.77). Both books 
contain useful tables. 


22.2 

A sign test for two independent samples 

This test is known as the median test. It compares the medians of tw’o 
independent samples. The null hypothesis is that no difference exists 
between the medians of the populations from which t lie samples are 
drawn. The corresponding parametric test is a t test for comparing the 
means of independent samples. The median test is based on the idea 
that in two samples drawn lrom the same population the expectation is 
that as many observations in each sample will fall above as below the 
joint median. 

The data consist of tw r o independent samples of N i and N 2 observa- 
tions. To apply the median test the median of the combined Ni + N 2 
observations is calculated. In each sample, observations above the joint 
median are assigned a + and those at or below it a — . The number of 
+ and - signs for each sample is ascertained. A x 2 lest is used to deter- 
mine wdiether the observed frequencies of f and — signs depart sig- 
nificantly from expectation under the null hypolhesis. 



356 


Selected nonparametric tests 


chap. 22 


The following are observations for two independent samples: 

Sample I 10 10 10 1 12 15 17 17 19 20 22 25 20 

Sample II 6 7 8 j 8 12 10 19 19 22 

The median of the Ni + N 2 observations is 10. Assigning a + to values 
above the median and a — to values at or below it, we obtain 

Sample 1 — —!— — —• + + + + + + + 

Sample II — — | — — - — + + + 

These data may be tabulated in the form of a 2 X 2 table as follows: 



+ 

— 

Sample I 

7 

5 

Sample II 

3 

6 


10 

11 


The value of x 2 for this table with Yates's correction for continuity is .51. 
The value of x 2 required for significance at the 5 per cent level is 3.81. 
Obviously, in this case we have no grounds for rejecting the null hypothe- 
sis that the samples came from populations with the same median. This 
is a two-tailed test. 

22.3 

A sign test for two correlated samples 

This test compares two correlated samples, and is applicable to data com- 
posed of N paired observations. The difference between each pair of 
observations is obtained. The null hypothesis is that the median differ- 
ence between the pairs is zero. The test is based 011 the idea that under 
the null hypothesis the expectation is that half the differences between 
the paired observations will be positive and the other half negative. The 
symmetrical binomial C 2 + £) v is used to obtain the probabilities required 
for a one-tailed or a two-tailed test. 

The following arc paired observations, X and Y, for a sample of 10 
individuals together with the sign of the difference between X and Y : 


X 

15 

19 

31 

36 

10 

11 

19 

15 

10 

16 

Y 

19 

30 

26 

8 

10 

6 

i7 

13 

22 

8 

Sign of A’ - Y 

— 

— 

+ 

+ 

0 

+ 

+ 

+ 

— 

+ 


Under the null hypothesis the probability that X is greater than Y is equal 
to the probability that Y is greater than X> which in turn is equal to *. 
The expected numbers of + and — signs are equal. In this example we 
have six plus signs, three minus signs, and one zero difference. The zero 




sec. 22.4 


A sign test for k independent samples 


357 


difference is discarded. From the binomial expansion (\ + *) 9 we can 
ascertain the exact probability of obtaining six or more plus signs under 
the null hypothesis. This probability is .254. This is a one-tailed test. 
The probability of obtaining either six or more plus signs or six or more 
minus signs is .508. This is a two-tailed test. Clearly here we have no 
grounds for rejecting the null hypothesis. 

Where N is not too small, the normal approximation to the binomial 
or x 2 may he used, preferably with Yates's correction. In this case the 
expected values are N / 2. In the above example the observed values are 
0 and 3, the expected values are 4.5 and 4.5, the corrected observed 
values are 5.5 and 3.5, and x 2 = -44. The probability of obtaining 
a x 2 pqual to or greater than .44 under the null hypothesis is .507. 
Although N is small, this is in close agreement with the exact probability 
of .508 obtained from the binomial. The reader will recall that x 2 pro- 
vides the probability for a two-tailed test. 

Instead of the x 2 procedure described above, a computationally 
simpler method may be used. Obtain the difference between the number 
of + and — signs. Denote this difference by 1 D. It may be shown that 

\D\ - 1 

z = - ‘ 

Vn 


approaches the normal form as N increases in size. This formula 
incorporates a continuity correction. Values of l .90 and 2.58 are required 
for significance at the .05 and .01 levels of significance, respectively, for 
a nondirectional test. In the above example we have six plus signs and 
three minus signs. D = 0 - 3 - 3, and 

, = = .67 

Vo 


Quite clearly this falls short of significance. The leader should recall 
that for (if = 1 the quantity z 2 = x 2 - In the above example, if the 
calculation is carried beyond two decimal places, z 2 = .fi667 2 = .4444 
and x 2 = .4444. 


22.4 

A sign test for k independent samples 

This is an obvious extension of the median test for two independent 
samples. The data are comprised of k samples of ni, 
observations. As before, the null hypothesis is that no difference exists 
in the medians of the populations from which the samples are drawn. 
The median of the combined n\ + n 2 + * * * + n k observations is 



358 


Selected nonparametric tests 


chap. 22 


calculated. For each sample, observations above Ihe joint median are 
assigned a + and those either at or below the joint median a — . The 
data are arranged in a 2 X A* contingency table, and a x 2 test applied. 
The following are data for four samples. 


Sample I 

3 

(i 

11 

1 1 

10 

18 

21 

33 

Sample II 

3 

3 

4 

:> 

f> 

8 

9 

1-1 

Sample III 

18 

18 

25 

20 

29 

:u 



Sample' IV 

14 

lb 

lfl 

22 

22 2.i 

27 

35 


The total number of observations is 30. The median is 18. Assigning a 
+ to values above the median and a — to values at or below', we obtain 


Sample I - 

_ 

- 

- 

— 

4 + 

Sample II 

- 

— 

- 

- 

— 

Sample III — 

- + 

+ 

+ 

+ 


Sample IV 

- + 

4- 

+ 

f 

4- 4- 


These data may be arranged in a 2 X 4 table as follows: 




Sample I 

2 

f> 

Sample II 

0 

S 

Sample III 

4 

0 

Sample D 

1 

2 


12 

IS 


The value of x 2 calculated on t his table is 7. of) The number of 
degrees of freedom is (4 — 1)(2 - 1) - 3 The value of x 2 required for 
significance at the 5 per cent level is 7.82. The observed value falls just 
below this. 


22,5 

A rank test for two independent samples 

Given lw r o independent samples of N 1 and N 2 observations, the combined 
N 1 + N 2 observations may be arranged in order. A ank 1 may be 
assigned to the smallest value, a rank 2 to the next smallest, and so on. 
The sums of ranks for the two samples* may be obtained. Denote these 
by R\ and R 2 for the sample of N t and N 2 cases, respectively. Assuming 
the samples to be drawn from the same population, wdiat are the expeeted 
sums of ranks k} The expected value of R\ is N 1 times the mean of the 




sec. 22.5 


A rank test for two independent samples 


359 


Ni + N 2 ranks and is 

E(R{) = N -^ Nl + Ni + 12 


Similarly, the expected value of R 2 is 

fi(R 2 ) = — 2 ^ 1 i VA±. 1 ) 


22.1 


22.2 


We calculate the deviation of R 1 or from the value expected on the 
assumption that the samples are drawn from the same population. The 
absolute deviations of Ri and R 2 from expectation are equal Con- 
sequently we need only calculate either R 1 or R«. 

When both N 1 and N 2 are equal to or greater than 8, the sampling 
distribution of the deviations of R 1 , or R 2l from expectation may be 
regarded as approximately normal, with a mean of zero and a standard 
deviation of *\/NiN 2 (Ni + N 2 + 1)/12. The normal deviate z with a 
continuity correction is then given by 


z — 


r- 


\Ri - E(Ri)\ - 1 
r ATiAT 2 (ATi 4 N 2 ± *1) 

12 

^ \2Ri - Ni(N + 1)| 
f A \N 2 (N + 1) 


^jKiN 2 (N + 1 ) 


22.3 


If this value is equal to or greater than 1 90 or 2 58, we reject the null 
hypothesis at either the 05 or 01 level and accept the alternative 
hypothesis that the samples are from different populations. 

Consider the following observations: 

Sample I 1 27 33 I 37 1 52 I 53 57 I (19 i 70 I 71 | 77 ' 

Sample II j 6 9 ! 14 | lfi 29 1-3 I 45 ( 47 : r >0 55 63 , 72 

Assigning ranks, proceeding from the smallest to the largest values, we 
obtain 

‘ 22 ( 

15 j 17 | 21 

The sum of ranks Ri for sample I is 142, and for sample II the sum R 2 is 
111. The expected values of R j and R 2 under the null hypothesis are, 
respectively, 115 and 138. R\ is 27 points above and R 2 27 points below 
expectation. The normal deviate is, then, 

27-1 


Sample I 

00 

!3 1 14 | lfi 

18 

19 

20 

Sample II 

1 ' 2 i » 1 

4 * (> * 9 

10 

11 

12 



36 o 


Selected nonpar ametric testa 


chap, aa 


Since this falls below 1.96, we have no grounds for rejecting the null 
hypothesis for a two-tailed test. The result is, however, significant at 
the 5 per cent level for a one-tailed test. 

When ties occur, the tied observations may be assigned the average 
of the ranks they would occupy if no ties had occurred. If ties are fairly 
numerous, a correction may be applied to the standard deviation in the 
denominator of the z ratio. Corrected for ties, that ratio becomes 

= \Ri - E(Rj ) | -J 

where N = Ni + N 2 and T = (J 3 — 0/12, where t is the number of 
values tied at a particular rank. The summation of T extends over all 
groups of ties. 

The above procedure is appropriate for samples greater than 8. For 
samples less than 8, exact probabilities may be obtained from tables 
based on the exact sampling distributions. These tables require the 
calculation of a statistic [/, the test being known as the Mann-Whitney 
U test. We calculate 


Ui = NiNt + J + - 1 - ) - fill 
Ut = NtN* + _ Ri 


22.5 

22.6 


These two values differ. U is taken as the smaller of the two. Tables 
have been prepared by Mann and Whitney (1947) showing the prob- 
abilities associated with different values of U for Ni and N 2 up to 8. 
Extended tables have been prepared by Auble (1953) for N\ and N 2 up to 
20. These tables are reproduced in Siegel (1956). 


22.6 

A rank test for two correlated samples 

The rank test described here for two correlated samples is due to Wilcoxon 
and is sometimes called the Wilcoxon matched-pairs signed-ranks test. 
The data are a set of N paired observations. The difference d between 
each pair is calculated. If the two observations in a pair are the same, 
then d = 0 and the pair is deleted from the analysis. Values of d may be 
either positive or negative. The d’s are then ranked without regard to 
sign. A rank 1 is assigned to the smallest d f 2 to the next smallest, and 
so on. If two or more d f s are tied, the usual practice is adopted of assign- 
ing to the tied ranks the average of the ranks they would have been 



sec. 22.6 


A rank teat for two correlated samples 


361 


assigned if they had differed. The sign of the difference d is attached to 
each rank. If d is positive, the rank is positive; if d is negative, the rank 
is negative. Under the null hypothesis the sum of the positive ranks 
will tend to equal the sum of the negative ranks. If a marked difference 
between the sums is observed, this constitutes evidence for the rejection 
of the hypothesis that the two sets of measurements are from the same 
population. The smaller of the two sums of ranks is denoted by the 
letter T. Table I of the Appendix provides values of T required at 
various significance levels for both a one-tailed and a two-tailed test for 
N up to 25. 

The following are paired observations, X and Y , for a sample of 10 
individuals: 


X 

15 

19 

31 

36 

10 

11 

19 

15 

10 

Y 

19 

30 

26 

8 

10 

r> 

17 

13 

22 

d 

-4 

-11 

5 

28 

0 

5 

2 

2 

-12 

Rank 

-3 

-7 

4 5 

9 

1 

4 5 

1 5 

1.5 

-8 


Values of d have been calculated. One pair of observations is tied and is 
deleted from subsequent consideration. The d! s are rank-ordered by 
absolute magnitude. The lowest values arc a pair of 2’s. These are 
assigned rank values of 1.5. The sum of negative ranks is 18. The sum 
of positive ranks is 27. Thus T, the smaller of the two sums, is 18. In 
this example N = 9, a pair of observations having been deleted. Table I 
of the Appendix shows that for N = 9 a value of T equal to or less than 
ti is required for significance at the 5 per cent level for a two-tailed test. 
These data do not warrant rejection of the null hypothesis. 

For large samples, T has an approximate normal distribution with 


Mean = 

4 

and 

Standard deviation - yj 
The normal deviate z is given by 


'N(N + 1)(2 N + 1) 
'24 


22.7 


22.8 


z = 


V 


N(N + 1) 


N(N + 1)(2N + 1) 
24 


22.9 


Values of 1.96 and 2.58 are, as usual, required for significance at the 
5 per cent and 1 per cent levels for a two-tailed test. 



3*2 


Selected nonparametric tests 


chap. 22 


22.7 

A rank test for k independent samples 

A rank test for k independent samples is the Kruskal- Wallis (1{KV2) one- 
way analysis of variance by ranks. The null hypothesis is that the k 
independent samples are from the same population. To apply this test 
all the observations for the k samples are ranked. The lowest value is 
assigned a rank of 1, the next lowest 2, and so on. The sum of ranks 
R v for each of the A samples is obtained. A statistic // is calculated from 
the data. This is defined by 

" - NUT+ l) I (*') - 3IW + '> 

where n t = number of observations in sample / 

N = total number of observations 
R t = sum of ranks for sample 1 

For samples of reasonable size this statistic has a clu-square distribution 
with k — 1 degrees of freedom and may be referred to any table of x 2 - In 
this context reasonable size may he interpreted to mean more than five 
cases in the groups. For k = 3 and n x < .*>, tables of exact probabilities 
have been prepared by Kruskal and Wallis 

When ties occur, the usual convention is adopted of assigning to the 
tied observations the average of the ranks they would otherwise occupy. 
The value of H is then divided by 


1 - 


XT 

N* - N 


where T = t 3 — t, and t is the number of tied observations in a group. 
The quantity H corrected for ties is 



The correction for ties will increase the value of //. 
The following are data for three samples: 


Sample I 

3 

7 

11 

16 

22 

29 31 36 

Sample II 

3 

4 

7 

18 

19 

32 

Sample III 

22 

38 

46 

47 

47 

50 53 54 



sec. 22.8 


A rank test for k correlated samples 


363 


In this example ni = 8, n s = 6, n» = 9, and jV = 8 + 6 + 9-=23. 
All 23 observations are ranked to obtain 


Sample I 

1.5 

4.5 

6 

7 

10.5 

12 

13 

15 

Sample 11 

1.5 

3 

4.5 

8 

9 

14 



Sample III 

10 5 

16 

17 

18.5 

18.5 

20 

21 

22 


The sums of ranks are calculated. These are ft, = 69.3, ft 2 = 40, and 
R a — 160.3. We note that we have four sets of ties of two observations 
each. T — 2’ — 2 = 6, and for the four sets XT = 24. The value of 
H is then 


12 / 69.5 2 40 2 

H _ 23(23 + 1) V 8 _6 

1 - 


166 

9 




24 


23 3 - 23 


3(23 + 1) 


= 13.88 


In this example the effect of the correction for ties is negligible and may 
for all practical purposes be ignored. On reference to a table of x 2 with 
df =■ 2, we note that an // of 13.88 is significant at better than the 1 per 
rent level. We may then reject the hypothesis that the samples are from 
the same population. 


22.8 

A rank test for k correlated samples 

A rank test for k correlated samples is the Friedman two-way analysis of 
variance by ranks (1937). The data are a set of k observations for a 
sample of N individuals. Such data arise in many experiments where 
subjects are tested under a number of different experimental conditions. 
The corresponding parametric test is an analysis of variance for two-way 
classification where observations are made on each of a gioup of indi- 
viduals under more than two conditions. If there is reason to believe 
that the assumptions underlying the analysis of variance are not satisfied 
by the data, the Friedman rank method is appropriate. 

The data are arranged in a table containing N rows and k columns. 
The rows correspond to individuals, or groups, and the columns to 
experimental conditions. Table 22.1 shows such an arrangement of 
data for eight subjects tested under four experimental conditions. The 
observations in the rows are ordered as shown in Table 22.2. For 
example, the four observations in the top row are 4, 5, 9, and 3. These 
are replaced by the ranks 2, 3, 4, and 1. The ranks in each column are 



364 


Selected nonparametric testa 


chap. 22 


summed. If the samples are from the same population, the ranks in 
each column will be a random arrangement of the numbers 1 , 2, 3, and 4. 
Under these circumstances the sums of ranks for columns will tend to be 
the same. If these sums differ significantly, the hypothesis that they 
are from the same population may be rejected. 

The test to be applied to the column sums of ranks is a chi-square 


Table 22.1 

Material recalled after four time intervals for a group 
of eight subjects 


Subject 

Time interval 



I 

II 

III 

IV 

1 

4 

5 

9 

3 

2 

8 

9 

14 

7 

3 

7 

13 

14 

» 6 

4 

16 

12 

14 

If) 

5 

2 

4 

7 

6 

6 

1 

4 

5 

3 

7 

2 

6 

7 

9 

8 

5 

7 

S 

9 


Table 22.2 

Ranks assigned by rows for the data of Table 2.21 


Subject 

Time interval 



I 

II 

III 

IV 

1 

2 

3 

4 

1 

2 

2 

3 

4 

1 

3 

2 

3 

4 

1 

4 

4 

2 

3 

1 

5 

1 

2 

4 

3 

6 

1 

3 

4 

2 

7 

1 

2 

3 

4 

8 

1 

2 

3 

4 

R. 

14 

20 

29 

17 



sec. 22.9 


Monotonic trend test for independent samples 


365 


test. We calculate the quantity 

k 

Xr * = Nk(k + 1 ) Xj R * ~ 3N ( k + ^ * ai * 

where k = number of conditions 
N = number of individuals 
R x — rank sum of column i 

Xr 2 has an approximate chi-square distribution with k — 1 degrees of 
freedom. For the data of Table 22.2 we have 

- 8x1“ + 1 ) <i4! + 201 + m ’ + ,7! > 

- 3 X 8(4 + 1) - 9.45 

This result foi df = 4 - L - 3 falls between the 5 and 1 per cent levels of 
significance Actually it is a little above the 2 per cent level If this 
level of confidence is acceptable, we may conclude that the samples are 
not drawn from the same population and that a difference in the experi- 
mental conditions is exerting an effect 

Exact probabilities arc available for l =- 3, N =- 2 to 9, and for 
A- = 4, N •= 2 to 1 . These tables are given by Friedman (1937) and 
Siegel (1956). 

Where ties occur the tied observations may be assigned the average 
of the rank they would otherwise occupy 

22.9 

Monotonic trend test for independent samples 

In Chap. 21 methods of trend analysis for parametric data were described. 
Methods of trend analysis using ranks may be used. These are the non- 
parametric analogues of the tests described 111 Chap. 21. A test of 
monotonic trend is the nonparametric analogue of a test of linear trend. 
This section describes a simple test of monotonic trend for independent 
samples, which employs the statistic & as used in the definition of Kendall's 
tau (see Chap. 14). 

Comment on the concept of monotonicity is appropriate here. A 
function Y = f(X) is said to be a monotonic increasing function if any 
increment in X is associated with an increment in Y . Also, if any incre- 
ment in X is associated with a decrement ir F, the function is said to be a 
monotonic decreasing function. The magnitude of the increment or 
decrement in X associated with the increment or decrement in Y is 
irrelevant to the concept of a monotonic function. Mcnotonicity is an 
order concept. 



366 


Selected nonparametric tests 


chap. 22 


A common type of experiment is one in which k treatments are 
applied to k independent groups of ni, n 2 , . . . , n* members, and a 
measurement obtained for each member. Such data are frequently 
analyzed using the analysis of variance for one-way classification. The 
nonparametric analogue of this is the Kruskal-Wallis one-way analysis 
of variance by ranks, described in Sec. 22.7. If, however, the k treat- 
ments exhibit an order, the question of monotonic trend may be raised. 
In effect, this question is answered by testing for significance the correla- 
tion between the ranks corresponding to the ni + n 2 + * * * + n k = N 
measurements and the ranks for the k treatments. The ranks for treat- 
ments consist of k sets of tied members with ni, n 2 , . . . , h* members, 
respectively, in each of the k sets. Thus for k = 3, and 

ni = 2 = rz 3 = 10 

the ranks for treatments consist of three sets of 10 tied values. 

To illustrate, the following are measurements for throe independent 
samples obtained under three ordered treatments. 

Sample I 3 7 11 Hi 22 29 31 30 

Sample II 3 4 7 18 19 32 

Sample III 22 38 10 47 47 50 5.f 54 5G 

These values are ranked as in the Kruskal-Wallis one-way analysis of 
variance by ranks. The values 1, 2, and 3 are assigned to indicate 
membership in the three samples. Thus the data are represented by a 
set of paired ranks as follows : 


Sample I 

X 

Y 

1 

1.5 

1 

4.5 

1 

6 

l 

7 

1 

10.5 

1 

12 

i 

13 

1 

13 


Sample II 

X 

Y 

2 

1.5 

2 

3 

2 

4.5 

2 

8 

2 

9 

2 

14 




Sample III 

X 

3 

3 

3 

3 

3 

3 

3 

3 

3 

Y 

10.5 

16 

17 

18.5 

18.5 

20 

21 

22 

23 


The average rank procedure for tied ranks could, of course, be applied to 
the X variable, in which case the values 1.5, 11.5, and 19 would replace 
the values 1, 2, and 3. No advantage attaches to this. In practical 
computation that X variable need not be recorded at all. The X variable 
is shown here merely to illustrate the fact that the prob’em is a simple 
correlational one. A value of S as used in the definition of Kendall’s tau, 
and described in Chap. 14 of this book, is calculated. The value of S in 
this example is as follows: 

14 + 10 + 9 + 9 + 4 + 3 + 3 + 1+ 9 + 9 

+ 9 + 9 + 9+ 7 = 105 



sec. 22.10 


Monotonic trend test for correlated samples 


3<S7 


The value 14 is obtained by comparing the initial Y value of 1.5 for 
sample I with all Y values for samples II and III. The Y value of 1 .5 for 
sample I need not be compared with other Y values for that sample 
because all values in sample I are tied on X . 

The sampling variance of S is obtained from formula 14.8. In this 
example N = 23. The A" variable contains three groups of ties -one 
group of eight ties, one of six ties, and one of nine ties. The Y variable 
contains four groups of tied pairs. The variance of S from formula 14.8 
is 1,245.25, and the standard error is 35.28. Subtracting unity as a con- 
tinuity correction and assuming normality of the sampling distribution, 
we obtain the normal deviate z =- 104/35.28 = 2.95, which warrants 
rejection of the null hypothesis. 

The steps involved in this procedure may be stated in summary as 
follows: 

1 Rank all the observations from 1 to N on the Y vari- 
able, as in the Kruskal- Wallis one-way analysis of 
variance by ranks, substituting average ranks for tied 
values. Use the ranks for the order o< treatments as 
the X variable, which will consist of as many sets of 
ties as there are treatments. 

2 Calculate S in the usual way. 

3 Calculate the sampling variance of S by applying 
formula 14.8. 

4 Subtract unity from 8 as a continuity correction, and 
divide this by its standard error to obtain the normal 
deviate z. 


22.10 

Monotonic trend test for correlated samples 

This test may be applied to data obta ined by making measurements on N 
subjects, each under k ordered conditions. We proceed by replacing the 
original measurements by ranks as in the Friedman two-way analysis of 
variance by ranks described in Sec. 22.8. The statistic S y as used in the 
definition of Kendall’s tau, is calculated for each of the AT subjects. The 
values of S are summed over the N subjects to obtain 28. The quantity 
2S is a measure of monotonic trend. It is descriptive of the increase, or 
decrease, in the experimental variable with increase in the treatment 
variable. 

In the absence of ties the sampling variance of 8 for any subject is 
given by formula 14.5 and is the same for all subjects. The variance of 



3 « 


Selected nonparametric tests 


chap. 22 


ZS is the sum of the separate variances. Thus 

azs 2 = 20’s 2 = N<?s 2 

Assuming the normality of the distribution of the normal deviate z is 
ZS 

z = — 22.13 

<7ZS 

The normal deviate z takes the usual critical values 1.9b and 2.58 at the 
.05 and .01 levels, respectively, for a nondirectional test. In many 
experimental situations in whicli trend tests are used, some prior basis 
exists for predicting the direction of the trend; consequently a directional 
test will frequently be appropriate. 

Estimates of probabilities obtained from the normal approximation 
to the distribution of ZS will be improved by using a correction for 
continuity. To apply this correction, we subtract unity from ZS if it is 
positive and add unity if it is negative. Thus the absolute value of ZS is 
reduced by unity. 

The method described above is illustrated with reference to the data 
of Table 22.1, which show hypothetical measurements for^ight subjects 
under four treatments. These measurements have been ranked for each 
of the N subjects, as shown in Table 22 2. A value of is calculated for 
each of the N subjects. The values of S summed over subjects are ZS as 
follows: 

5-0 + 0 + 0- 4 + 4 + 2 + 6 + 6 = 14 

For k = 4 from formula 14.5 the sampling variance as 2 = 8.67. 
Note that in this case in using formula 14.5 wo write N = k the number 
of treatment groups. The variance of a^ s 2 = 8 X 8.67 = 69.88, and 
i tzs = 8.38. Reducing 2N by unity as a continuity correction results in 
z = 13/8.33 = 1.56, which falls short of significance at the .05 level. 
Thus we cannot argue for a significant monotonic trend from these data. 

It is of incidental interest to note that for k = 2 the quantity 
(|2S| — 1 ) 2 /N is distributed approximately as x 2 with df = 1, and the 
quantity (\ZS\ — 1 )/y/N has an approximately normal distribution. 
In this case the present test is the same as the usual sign test for two 
correlated samples. 

The steps involved in the above procedure may be summarized as 
follows: 

1 Rank the scores for each subject from 1 to k. 

2 Calculate S for each subject. 

3 Sum S for all subjects to obtain ZS . 



sec. 22.ZI 


A rank test for comparing variation 


369 


4 Calculate the sampling variance of S, <r 5 2 , using formula 
14.5, and multiply this by N to obtain the sampling 
variance of 2 S, az S 2 - The square root of this quantity 
is the standard error. 

5 Divide | 2 S| — 1 by the standard error ass to obtain 
the normal deviate z. 

6 Reject the null hypothesis for a nondirectional test at 
the .05 level if z > 1.90, and at the .01 level if z > 2.58. 
The corresponding values of z for a directional test are 
1 .04 and 2.23. 

Ties may occur in the measurements obtained for each subject 
Ties will presumably not ordinarily occur in the treatment variable, the 
values of that variable being controlled by the experimenter. Under 
these circumstances the value a$* may be calculated separately for each 
subject using formula 14.7, and the values of a s 2 summed to obtain 
ass 2 - A more convenient procedure is to calculate ais 2 as in the untied 
case, and then subtract unity from this variance for each tied pair, 3.07 
for each triplet of ties, and so on. 

Illustrative data are shown in Table 22.3. This table shows meas- 
urements obtained for eight subjects under four conditions. The quan- 
tity 2*S has been calculated, and is found to be —25. The data contain 
five tied pairs, and one triplet of ties. For k = 4 without ties the sam- 
pling variance as 2 = 8.07, and ihe variance of ass 2 — 8 X 8.07 — 09.33 
We subtract from this 5.00 for the five tied pairs and 3.07 for the triplet 
of ties to obtain a corrected value of the variance of 00.07 and a standard 
error of 7.79. Reducing the absolute value of 2 *S by unity as a continuity 
correction results in z = 24/7.79 = 3.08, which is significant at better 
than the .01 level for a nondirectional test. 

For a more detailed discussion of non parametric trend tests see 
Ferguson 


22.11 

A rank test for comparing the variation 
in two independent samples 

Siegel and Tukey have proposed a nonparametric test of relative 
variation in two independent samples. Consider the following observa- 
tions for two samples : 

Sample I 25 5 14 19 0 17 15 8 8 

Sample II 12 16 6 13 13 3 10 10 11 



370 


Selected nonparametric teats 


chap. 22 


We proceed by arranging the observations in a single series of increasing 
size and identifying the sample as I or II. 


Score 

0 

3 

5 

6 

8 

8 

10 

10 

11 

Sample 

1 

11 

1 

II 

I 

I 

II 

II 

II 

Score 

12 

13 

13 

14 

15 

16 

17 

19 

25 

Sample 

11 

11 

II 

I 

I 

II 

I 

I 

I 


If we wish to test the null hypothesis that the samples come from 
the same population against the alternative that the populations diffei in 
location , we would very simply assign ranks 1 to 18 according to the score, 
determine the sum of ranks for 1 or II, and proceed with a rank test of 


Table 22.3 

Data illustrating trend test for correlated data with ties 



Measurements under lour t 

onditions 

Subject 

I 

11 

HI 

IV 

1 

7 

7 

4 

2 

2 

6 

1 

5 

5 

3 

3 

2 

2 

2 

4 

4 

2 

2 

2 

5 

8 

U 

5 

2 

6 

4 

7 

2 

2 

7 

9 

4 

3 

1 

8 

10 

4 

2 

2 


Hanks under four conditions 


Subject 

I 

II 

III 

IV 

S 

1 

3 5 

3 5 

2 

1 

-5 

2 

4 

1 

2 5 

2 5 

-1 

3 

2 

1 

3 5 

3 5 

+3 

4 

4 

2 

2 

2 

-3 

5 

4 

3 

2 

1 

— 6 

6 

2 

4 

1 5 

1 b 

-3 

7 

4 

3 

2 

1 

-6 

8 

4 

3 

1 5 

1 5 

-4 


XS = ~ 25 


_ _ 24 ^ 
7 79 


3 08 


p< 01 







Exercises for chapter 22 


371 


the type described in Sec. 22.5. If, however, we are interested in a test 
of variation , and not location, we may assign ranks in a different fashion. 
We assign rank 1 to the lowest number in the sequence, ranks 2 and 3 
to the two highest numbers, ranks 4 and 5 to the next two lowest, and 
so on. This ranking procedure is illustrated as follows: 


Score 

0 

3 

5 

6 

8 

8 

10 

10 

11 

Sample 

I 

II 

I 

II 

I 

I 

II 

II 

II 

Rank 

1 

4 

5 

8 

9 

12 

13 

16 

17 

Score 

12 

13 

13 

14 

15 

16 

17 

19 

25 

Sample 

II 

II 

II 

I 

I 

II 

I 

I 

I 

Rank 

18 

15 

14 

11 

10 

7 

6 

o 

2 


If the total number of ranks is an odd number, the middle rank is omitted ; 
that is, no rank is assigned to it. Under the null hypothesis the mean 
rank for sample 1 will tend to equal the mean rank for sample II. If the 
populations differ in variation, the sample from the population with the 
greater variation will tend to fall at the extremes of the rank sequence, 
and in consequence will he assigned lower ranks. The sample from the 
population with the lesser variation will tend to fall in the middle of the 
sequence, and will be assigned higher ranks. Thus the more variable 
sample will have a smaller mean rank than the less variable sample. 

The rank test described in Sec. 22.5 may now be applied. The only 
difference between the present test of variation and the test of location 
given in Sec. 22.5 is in the method of assigning ranks 

In the above example R 1 = 59 and Hi — 1 12, and :: is given by 

[2/?t — N\(N + 1)| — 2 

' y/N\Ni(N + i)/3 

= |2 X 59 - 9 X 19) - ? _ 2 25 
" V9 X 9 X 19/3 

This falls between the .05 and .01 levels for a nondireclional test. With 
ties the correction suggested in Sec. 22.5 may be used. 


EXERCISES 


I 


The following are data for two groups of experimental animals: 
Group I 104 
Group II 62 


109 

127 

143 

187 

204 

209 

266 

82 

89 

90 

101 

106 

109 

109 


277 

205 


Apply a sign test to test the hypothesis that the two samples come 
from populations with the same median. 



37 * 


Selected nonparametric teats 


chap. 22 


2 The following are data for a sample of nine animals tested under con- 
trol and experimental conditions : 


Control : 

21 

24 

26 

32 55 

82 

46 

55 

88 

Experimental : 

18 

9 

23 

26 82 

199 

42 

30 

62 


Test the significance of the difference between the two medians using a 
sign test. 

3 The following data are for four groups of experimental animals: 


Group I 

r> 

7 

16 

14 

19 

Group II 

8 

15 

18 

20 

24 

Group ITI 

17 

21 

22 

25 

29 

Group IV 

23 

27 

28 

31 

32 


Apply a sign test to test the hypothesis that the four samples come 
from populations with the same median. 

4 Apply the Mann-Whitney U test to the data of Exercise 1 above. 

5 Apply the Wilcoxon matched-pairs signed-ranks test to the data of 
Exercise 2 above. 

6 Apply a Kruskal- Wallis one-way analysis of variance fly ranks to the 
data of Exercise 3 above. 

7 Apply a Friedman two-way analysis of varianc e by ranks to the follow- 
ing data: 


Treatment 

Subject 

I 

IT 

III 

IV 

1 

5 

9 

4 

1 

2 

0 

8 

7 

3 

3 

9 

10 

S 

7 

4 

5 

10 

4 

2 

5 

8 

0 

4 

1 

6 

10 

8 

7 

5 

7 

14 

12 

13 

10 


8 Apply a monotonic trend test to the data of Exercise 3 above. 

9 Apply a monotonic trend test to the data of Exercise 7 above. 



Errors of Measurement 



The nature of error 


The measurements obtained in the conduct of experiments are subject 
to error of greater or less degree. In measuring the activity of a rat, the 
intelligence of a child, or the response latency of an experimental subject, 
we may assume that the individual measurements are subject to some 
error. In general, the concept of error always implies a true, fixed, 
standard, or parametric value which we wish to estimate and from which 
an observed measurement may differ by some amount. The difference 
bctw r een a true value and an observed value is an error Jf we represent 
a particular observation by A\, the true value which it purports to esti- 
mate by 7\, and an error by e„ we may write 


e x = X t - T t 


23.1 


where e x may take either positive or negative values 

A distinction may be made between systematic and random error. 
Observations which consistently overestimate or underestimate the true 
value are subject to systematic error A stop watch which underesti- 
mates time intervals will yield observations with systematic errors. A 
random error exhibits no systematic tendency to be either positive or 
negative and is assumed to average to zero over a large number of sub- 
jects or trials. Random errors are also assumed to be uncorrelated both 
with true scores and with each other. The discussion in this chapter is 
concerned exclusively with random errors. 

Any definition of error as the difference between an observed and 
true value is meaningless unless a precise definition is attached to the 
concept of true value. In theory a true value is sometimes conceptual- 
ized as the mean of an indefinitely large number of measurements of an 
attribute made under conditions such that the true value remains con- 
stant, and the procedures used in making the measurements do not change 
from trial to trial in any known systematic fashion. In mathematical 



374 


Errors of measurement 


chap. 23 


language the true value may be defined as 

K 

I* 

T , = lim 

where X } refers to the j\h measurement. Thus the true value is the limit 
approached by the arithmetic mean as t lie number of repeated observa- 
tions K is increased indefinitely. This concept of true ^ilue is appropri- 
ate for the measurements of physical quantities. For example, a yard- 
stick may be used to measure the length of a desk. The measurement 
procedure may be repeated many times, and the variation in the observa- 
tions attributed to error. It may be assumed t hat a considerable number 
of repeated observations may be made under fairly constant conditions, 
neither the desk nor the yardstick changing in any systematic way. By 
increasing the number of observations and taking their mean, the error 
in estimating the true value may be reduced. Theoretically, this error 
may be made as small as we like by increasing the number of observa- 
tions. As t lie number of observations becomes indefinitely large, the 
mean approaches the true value as a limit. 

Questions may be raised about the appropriateness of this concept of 
true value in the measurement of psychological quantities. Clearly, in 
the measurement ot human beha\mr t lie making of a large number cf 
repeated observations is usually not possible The attribute being 
measured may fluctuate or change markedly with time, or the process of 
repeated measurement may modify the attribute under study. F 01 
example, in measuring the intelligence of a child, it is obviously out of tin 
question to administer the same intelligence test 100 times to obtain an 
estimate ot error. Quite apart from the labor involved in such estima- 
tion, the results obtained would be invalidated by practice, fatigue, and 
other effects. This circumstance has given rise in psychological work to 
a variety of procedures for estimating error other than by a series of 
repeated measurements. Despite the operational impracticality in 
psychology of estimating error by making a large number of repeated 
measurements, the concept of true score as the mean of an indefinitely 
large number of such measurements is still a necessary and important 
concept in the study of errors of measurement. Here we note that the 
role of true score is analogous to that of population parameter in sampling 
statistics. The difference between the sample statistic and the popula- 
tion parameter is a sampling error. By increasing sample size the magni- 
tude of sampling error is reduced. For an infinite population an unbiased 
sample statistic will approach the population parameter as a limit as the 
sample becomes indefinitely large. A sampling error is an error associ- 



sec. 23.2 


Effect of measurement error on the mean and variance 


375 


ated with a statistic based on a sample of observations. An error of 
measurement is usually cons + rued to be an error associated with a par- 
ticular observation which is an estimate of a true value. In most 
instances both population parameters and true values cannot be known 
but can only be estimated from fallible data. This circumstance does 
not detract from the meaningfulness of, and necessity for, these concepts, 
nor does it prevent the making of meaningful statements about the 
magnitude of error. A concept of true value, however defined, is a 
logical necessity for any theory of error. 

23.2 

Effect of measurement error 
on the mean and variance 

Consider a population of measurements. Each measurement is subject 
to error and may be written as 

X x - T x + e, 

where X , is the observed and T % the true measurement. By summation 
over all members of the population we obtain 

2 X x = 27\ + 2>, 

If we assume that measurement error is random, and as often positive as 
negative, we may write 2c» = 0. Consequently, the sum of measure- 
ment^ subject to error is equal to the sum of true measurements. It 
follows also that the means of the observed and true values are equal, 
both being equal to the population moan v We con elude that measure- 
ment error exerts no systematie effect 011 the aritmnetie mean. A mean 
based on a sample of N measurements will exhibit no tendency to be 
either greater than or less than the mean of true measurements. The 
expectations of the mean of observed and true scores are equal to the 
population mean that is, 

E(X) = E(T)'= M 23.2 

Measurement error exerts an effect on the sampling variance of the arith- 
metic mean. This point is discussed in Sec. 23.6. 

Measurement error exerts a systematic effect on the variance. We 
may write 

(X t - #.) = (Tx -*) + e x 

If we square this identity, sum over all members of the population, and 
divide by N PJ where N p is the number of members in the population, we 



376 


Errors of measurement 


chap. 23 


obtain 

S(JT. ~ nY 2(7\ - m ) 2 , 2 a , 2 22(7\ - M )e, 

AT" P ' “ N p ^ N p ^ N p 

On the assumption that measurement errors are random and un correlated 
with true scores, the third term to t ho right is equal to zero, and we may 
write 

<* x 2 — < tt 2 + < r 2 23.3 

Thus the variance of observed scores is equal to the variance of true 
scores plus the variance of the errors of measurement . Tor a fixed or 2 , t he 
more inaccurate the measurements the greater the value of a/ and the 
greater the variance a x 2 

2 3-3 

The reliability coefficient 

Consider a situation where each member of a population has boon meas- 
ured on two separate occasions. Two observations are available for each 
member. Both are presumed to be measures of the sam* attribute, and 
both are subie< 1 to error. We may write 

X t i = T x + e»i 
AT,2 = T x + e»2 

In deviation form these become 

(-Y* - m) = n \ - m) + m 

l-Y,* — n) = ( 7 T , — n) + C12 

By multiplying these two equations, summing over a population of N p 
members, and dividing by N p <ti<T 2 , we obtain 

- Wijj n)(X l2 7 m) 

Pt * NpffliTi 

= ~ m)\ + Se ll e l2 +_ 2 fn(T l -_ M ) +_Se l 2 (7\ - jx) 

N p <T\0 2 

On the assumption that errors are random and uneorrelated with eaeh 
other and with true scores, the three terms in the right ; n the numerator 
are equal to zero. Because the paired observations are measures of the 
same attribute, cri = <r 2 . Also S(2 T , — m ) 2 = N p <tt 2 . Hence, writing 
ai — <r* = <r Xf 



23.4 



sec. 23.4 


Methods for determining reliability 


377 


where p„ is the reliability coefficient The reliability coefficient is a 
simple proportion. It is the proportion of obtained variance that is true 
variance. If <r x 2 =- 400 and oy 2 ~ 300, the reliability coefficient p xx = .90. 
This means that 90 per cent of the variation in the measurements is 
attributable to variation in true score, the remaining 10 per cent being 
attributable to error. Where sample estimates are used we may write 




sj 2 


* 3-5 


where r TX is the sample estimate of the reliability coefficient. 


23-4 

Methods for determining reliability 

Above, the reliability coefficient has been discussed without reference to 
methods for obtaining such coefficients in practice. A number of differ- 
ent practical methods foi determining reliability are used. These 
methods are as follows. 

1 Test-retest method The same measuring instrument is applied on 
two occasions to the same sample of individuals. When the instru- 
ment is a psychological test, the test is administered twice to a sample 
of individuals and the scores correlated. 

2 Parallel -forms method Parallel or equivalent forms of a test may 
be administered to t he same group of subjects, and the paired observa- 
tions correlated. Criteria of parallelism are required. 

3 Split-half method This method is appropriate where the testing 
procedure may in some fashion be divided into tw r o halves and two 
scores obtained. These may be correlated. With psychological tests 
a common procedure is to obtain scores on the odd and even items. 

4 Internal-consistency methods These are used with psychological 
tests comprised of a scries of items, usually dichotomously scored, a 
1 being assigned for a pass and a 0 for a failure. These methods 
require a knowledge of certain test-item statistics. 

The interpretation of a reliability coefficient depends on the method 
used to obtain it. When the same test is administered twice to the same 
group with a time interval separating the two administrations, some 
variation, fluctuation, or change in the ability or function measured 



378 


Errors of measurement 


chap. 33 


may occur. The departure of r xx from unity may be construed to result 
in part from error and in part from changes in the ability or function 
measured. With many psychological tests the value of r„ will show a 
systematic decrease with increase in the time interval separating the two 
administrations. When the time interval is short, memory effects may 
operate. The subject may recall many of his previous responses and 
proceed to reproduce them. A spuriously high correspondence between 
measurements obtained at the two testings may thereby result. Regard- 
less of the time interval separating the two testings, varying environ- 
mental conditions such as noise, temperature, and other factors may 
affect the result obtained. Likewise, varying physiological factors, 
fatigue and the like, may exert an influence. 

In estimating reliability by the administration of parallel or equiv- 
alent forms of a test, criteria of parallelism are required. Test content, 
type of item, instructions for administering, and the like, should be 
similar for the different forms. Also the parallel forms should have 
approximately equal means and standard deviations. In addition, the 
intercorrelations should be equal. Thus with three parallel tests the 
intercorrelations should be such that r 12 ~ r i3 = r 2 ,t. A discussion of 
criteria for parallel tests is given by CJulliksen (1950). Situations arise 
where a large pool or population of test items is available. Samples of 
items may be drawn at random. Each sample of items is a randomly 
parallel form. This approach to the development of parallel tests has 
been studied at length by Lord (1955a, 1955b). 

In many situations a single administration only of a test may be 
possible. The test is divided into two halves. A not uncommon pro- 
cedure is to divide a test into odd and even items. Scores are obtained 
on the two halves, and these are correlated. The result is a reliability 
coefficient for a half test, (liven a reliability coefficient for a half test, 
the reliability coefficient for a whole test may be estimated using the 
Spearman-Brown formula. This formula is 


T xx 


2rhh 

1 + ?hh 


23.6 


where r** is the reliability of a half test. If, for example, r hh - .80, then 
r xx = .89. The Spearman -Brown formula provides an estimate of the 
reliability of the whole test. It estimates what the reliah.lity would be if 
each test half were made twice as long. 

The split-half method should not be used with highly speeded test 
material. Obviously, if a test is comprised of easy items, and a subject is 
required to complete as many items as possible within a limited time 
interval, and all or nearly all items are correct, the scores on the two 



sec. 23.4 


Methods for determining reliability 


379 


halves would be about the same and the correlation would be close to 

+ 1 . 00 . 

A method of obtaining reliability coefficients using test-item sta- 
tistics has been developed by Kuder and Richardson (1937). Many 
psychological tests are constructed of dichotomously scored items. An 
individual either passes or fails the item. A 1 is assigned for a pass, and 0 
for a failure. The score is the number of items done correctly. The 
proportion of individuals passing item i is denoted by the symbol p Xf and 
the proportion failing, by q Xy where </, - 1 — p t . An estimate of reli- 
ability is given by 

n 

- X P.7. 

„ n ,„i 


where n = number of test items 

s x 2 = variance of scores on test defined as 
S(JT - X) 2 /N 

p x q , = product of proportion of passes and fails for 
item ? 

n 

£ p t <jt = sum of these products for n items 

t - 1 

This formula is frequently referred to as Kuder-Richardson formula 20. 
The coefficient r XJ computed by this formula will take values ranging from 
zero to unity. If the responses of individuals to the test items are 

n 

assigned at random, the expectation of s T ? is equal to £ p x q x and the 

1 =» 1 

expectation of r„ is zero. If all items are perfectly cori elated, a situation 
which can only arise when all have the same difficulty, r« = 1. The 
correlation between items is the phi coefficient. 

If all assumptions implicit in the split-half method of estimating 
reliability coefficients are satisfied, the split-half and Kuder-Richardson 
formula 20 will yield identical results (Terguson, 1951). Because these 
assumptions are rarely, if ever, satisfied in practice, differences in the 
coefficients obtained will result. One difficulty with the split-half 
method is that a test may be split in a great many ways, yielding many 
different values of r xx . It may be shown that if a test is split m all 
possible ways, the average of all the split-half reliability coefficients with 
the Spearman-Brown correction is the Kuder-Richardson formula 20. 
This coefficient has a simple unique value for any particular test. 

The Kuder-Richardson formula 20 is a measure of the internal con- 
sistency, or homogeneity, or scalability, of the test material. In this 



38 o 


Errors of measurement 


chap. 23 


context those three terms may he considered synonymous. If the items 
on a test have high intereorrelations with each other and are measures of 
much the same attribute, then the reliability coefficient will be high. If 
the ini ercorrelat ions are low, either because the items measure different 
attributes or because of the presence of error, then the reliability coeffi- 
cient will be low. 

The Kuder- Richardson formula 20 may be applied to tests comprised 
of items which elicit more than two categories of response. Personality 
and interest inventories and attitude scares frequently permit three or 
more response categories. For a dichotoinously scored item we note that 

n ?i 

p x q t is the item variance .9, 2 and ^ p x q t — £ .v\ the sum of the item 

1 1 i-i 

variances. Lor an item with more than two response categories, where 
each category has been assigned a weight, the individual item variances 
may be calculated and their sum may bo substituted in Kuder -Richardson 

n 

formula 20 for £ />,»/,. Consider a test comprised of statements w'hicli 

t -- 1 

elicit the possible responses “agree/* “undecided, ” “disagree.’* Let />i, 
7 > a , and be the proportion of individuals responding in the three 

categories. If weights 8, 2, 1 or +1,0, — 1, or any other system of 
weights, are assigned to the categories, the item variance may be calcu- 

Tl 

lated. These may be summed, and the sum substituted for V p,q % . 

The quantity s r 2 is, of course, the variance of scores obtained by summing 
items with the assigned weights. For further discussion see Ferguson 
(1951). 

On the assumption that all test items are of equal difficulty, a simpli- 
fied form of the Kuder- Richardson formula may be obtained for use with 
dichotoinously scored test items. This formula may be written as 


T xx 


11 r _ 

n — l L n$ x 2 


23.8 


where X is the mean test score and s T 2 is the variance. This formula is 
referred to as Kuder-Richardson formula 21. The formula may be 
derived using the assumptions implicit in the concept of randomly 
parallel tests (28.9). 


23-5 

Effect of test length on the reliability coefficient 

In discussing split-half reliability, a formula was given for estimating the 
reliability of a whole test from the reliability of a half test. This formula 



sec. 23.6 


Effect of measurement error on the mean 


381 


is a particular case of a more general Spearman-Brown formula for 
estimating increased reliability with increased test length. The more 
general formula is 

" 1 + (k- i)7„ 

where r XJ . = an estimate of reliability of a test of unit 
length 

r u = reliability of test made k times as long 
If r XJ — .00 and the lest is made four times as long, the reliability coeffi- 
eient ?*h for the lengthened test is estimated as 80. From a theoretical 
point of view a test may be made as reliable as we like by increasing its 
length Practical ronsidciations, of course, restrict test length. 

Because leliabihlv is a function of teM length, reliability coefficients 
calculated 01. tests of different lengths are, for icriain purposes, not 
directly comparable. If, for example, we wish to compare the reliability 
of difTcM nt type** of test material, wo presumably should require measures 
which were independent of the differing lengths of the tests. One pro- 
cedure hero is to use the Spearman -Brown formula and calculate reli- 
ability coefficients for a standard test of 100 items If a test has 10 items, 
then a value of /. 7o° - 2 oO would be used in ruinating the reli- 

ability of the standard tost If another te^l has 1 .">() items, then 
/, - { “ 8 - .07, and so on Thus a comparison ot the reliabilities of 
different tests may he made which is independent of differing test lengths. 


23.6 

Effect of measurement error on the 
sampling variance of the mean 

Because measurement error affects the variance of a set of measurements 
it will also affect the sampling variance of the mean. The sampling 
variance of the arithmetic mean may he w'ntten as 


<7 


<Tx_ __ zrl 4- Z' 2 
N ~ AT N 


23.10 


The component aj 2 /N is the sampling variance of the means of samples of 
true measurements, and (r e 2 /N is the component of the sampling variance 
attributable to measurement error. While measurement error exerts no 
systematic effect on the sample mean as an estimate of g, such error 
increases the variation in sample means with repeated sampling. The 
increase in sampling variance over that wdth no measurement error 
present is cr e 2 /N. 



38a 


Errors of measurement 


chap. 23 


The ratio of the sampling variance of the mean of true scores to the 
sampling variance of the mean of obtained scores is the reliability coeffi- 
cient. Thus 


_ ( Tt 2 &T*/N __ <Tf 2 

<r, s <r, s /JV <ry* 


23.II 


This means that the reliability coefficient may be interpreted as descrip- 
tive of the loss in efficiency of estimation resulting from measurement 
error To illustrate, a mean calculated on a sample of 100 cases, where 
Pxz = .80, has a sampling variance equal to that of a mean calculated on a 
sample of 80 cases where p IX = 1 .00. The loss in efficiency of estimation 
resulting from measurement error amounts to 20 cases in 100. 


2 3-7 

Effect of errors of measurement 
on the correlation coefficient 

Errors of measurement tend to reduce the size of the correlation coeffi- 
cient. The correlation between true scores will tend to be greater than 
the correlation between obtained scores. If p iy is the correlation between 
X and Y in the population, the relation between the correlation of true 
and obtained scores is given by 

Pxy 

PT X T W = - 7 -=^== 2 3 * 12 

V PxzPw 

where pr x r„ = correlation between true scores 
p xr = reliability of A" 
p vy = reliability of Y 

This formula is known as the correction for attenuation . Errors tend to 
attenuate the correlation coefficient between obtained scores from the 
correlation between true scores. For a derivation of this formula and a 
discussion of the simplifying assumptions involved, see Walker and Lev 
(1953). The corresponding sample form of the correction for attenuation 
is 

r zy 

r T .T, = — 7 ^ == *3.i3 

V r XZ T yy 

To illustrate, let r xy — .60, r„ = 80, and r vy — .90. The correlation 
between true scores on X and Y, estimated by the above formula, is .707. 
The correlation may be viewed as attenuated from .707 to .60 because of 
errors of measurement. The squares of these coefficients yield a better 
appreciation of the loss in predictive capacity due to errors of measure- 
ment. The squares of .707 and .60 are .50 and .36. We conclude that 



sec. 23.9 


The standard error of measurement 


383 


the presence of errors of measurement results m 1 1 per cent loss in pre- 
dictive capacity. If the correlation between two variables is low, the 
correlation will not be markedly increased by improvements m reliability. 
If the correlation is high, improving reliability may result in substantial 
gains in the prediction of one variable from another 

Because the correlation between true scores can never exceed unity, 
the maximum correlation between two variables arises where r TxTy = 1. 
Under this circumstance r xu = \/ r„r m This is an estimate of the maxi- 
mum correlation between A” and V. If r fI — .80 and r yy = .90, the 
maximum possible correlation between X and Y is estimated as 

V.80 X .90 = .85 


23.8 

Reliability of difference scores 

Situations arise where the difference between two sets of measurements is 
defined as a score. The two measurements may be initial, or pre- 
stimulus values, and values obtained in the presence of a stimulus factor. 
Tf differences are obtained between standard scores on A' and F, that is, 
between z x and z v , the reliability of the differences may be estimated by 


Tdd 


f *x ~f~ r w 2 r g |/ 
2 — 2 r xy 


2314 


where r xx and r yy - reliability coefficients for X and Y 
r d d = reliability of difference z x — z v 

For fixed values of r„ and r vv the reliability of the difference will decrease 
with increase in r ry from zero. If r xx = .90 and r yy — .80, for r xy = .80 
the reliability of differences r d d = .25. For r xy = 0, r d d ~ .85. As r xy 
departs in a positive direction from zero, the error variance accounts for 
an increasing proportion of the total variance of differences, with a 
resulting decrease in reliability. The point to note here is that difference 
scores may be grossly unreliable and should be used only after careful 
scrutiny of the data. When the correlation between the two variables is 
reasonably high, it is probable that with many sets of data most of the 
variance of differences is error variance. 


23-9 

The standard error of measurement 

Because p xx = <t t 2 /<t x 2 and a 2 = <r T 2 + we may write 


23.15 



384 


Errors of measurement 


chap. 23 


and 

“ <r x \/l — Pxx 23.16 

This latter formula is the standard error of measurement. Whore s t and 
r xr are used as estimates of a x and p rjr , we obtain 

s. - ** \/l ~ r„ 23.17 

as the corresponding sample estimate. If it may be assumed t hat errors of 
measurement are independent of the magnitude of test score, t lien s e may 
boused as the standard error associated with a single scon* and interpreted 
in the same way as the standard error of any statistic. On the assump- 
tion of a normal-curve approximation, the and 99 per cent confidence 
intervals of an individual’s score X t are estimated by X, 1 1 91 is, and 
.V, 1 2. 58, v respectively. With most psychological te-ds, how e\ er, errors 
of measurement are not independent of the magnitude of test score. The 
standard error is higher in the middle-score range and diminishes 111 size 
as the seore departs from the average Because of this the use of s e to 
estimate confidence intervals for particular .scores nuvv \ leld misleading 
results. The variance « t 2 is a sort of average value, and s, wtien applied to 
particular scores has meaning only in relation to scores near the average. 

The problem of the standard error of measurement associated with 
psychological test scores has been investigated bv Lord M9.V>a, 19, fob, 
1957). Lord defines the standard error of measurement as ihe standard 
deviation of scores an individual might be expected to obtain on a large 
number of randomly parallel test forms. The assumption is that the 
ability of the individual remains unchanged and is not affected by prac- 
tice, fatigue, and the like. Randomly parallel forms are viewed as com- 
posed of items drawn at random from a large pool or population of items. 
The items are scored 1 for a pass and 0 for a failure, a score on a test 
being the sum of item scores. The proportion of items in the population 
which individual 1 can do correctly is 0,. The true score of individual i 
for a test of n items is 1\ -- nd t . The number of items done correctly by 
individual i for a random sample of n items is A"*. The standard devi- 
ation of the sampling distribution of the AYs is the standard error. This 
is obtained from the standard deviation of the binomial and is given by 

= V >10,(1 - 9,) 

* yjl T '( n ~ T <) 23.1S 

An individual’s score X t may be used as an estimate of TV Introducing 



sec. 23.10 


Concluding observations 


385 


the factor n/(n — 1 ) to obtain an unbiased estimate yields 



This formula may be used for estimating tin* standard error of a test score 
AY Where n - 100 and A', = M, *„lY.) - .">02. Where X, = 80 and 
n - 100, s,( A\) = 1.02. "Die standard error diminishes in size as the 
more extreme values are approached. 

Lord (lOo^a) has shown that if » f s is taken as the average of s e 2 (X t ) 
and substituted in / u 1 - s, 2 '.v\ un biased variance estimates being 
used throughout, Kuder- Richardson formula 21, described in See. 23.4, 
is obtained. 

In most practical situations where parallel tests are used, the tests 
are not randomly parallel 111 the strict est sense. The items are matched 
to some extent. The standard error for such tests will be less than that 
estimated by s,(X t j Thus * f (A\) in most situations will tend to be a 
moderate overestimate Jt is ol interest to not* that s,(A',) is indepen- 
dent of the characteristics of the items of which a test is comprised, pro- 
vided, of course, that these are scored 1 for a pass and 0 for a failure 


23.10 

Concluding observations 

The theorv and method associated with the study of measure ment error 
in psychology have been developed in relation to psychological testing. 
Much of this theorv and method is generally applicable to m* usurements 
of all kinds Little attention has been directed to the Mudv of measure- 
ment error by experimental psychologists It is probable that m much 
work in the field of human and animal learning, fairly gross eiror attaches 
to many of the measurements made. Reliability coefficients less than .ol) 
are not uncommon, and coefficients of zero are perhaps not isolated curi- 
osities. The eirors which attach to measurements m the field of animal 
experimentation are known quite often to be substantial. Low reliability 
does not necessarily invalidate a technique as a device for drawing \alid 
inferences. Low reliability may be compensated for by increase in sam- 
ple size. An unreliable technique used with a small sample is, however, 
capable of detecting gross differences only, and the probability of not 
rejecting the null hypothesis when it is false may be high. When signifi- 
cant results are reported with an unreliable technique* on a small sample, 
the treatment applied is usually exerting a gross effect. 

A common type of experimental design requires the making of meas- 
urements on an experimental group in the presence of a treatment and on 



Partial 

and Multiple 

Correlation 



-i Introduction 


Previous discussion of correlation lms been concerned with the relation- 
ship between two variables, fn many investigation!* data on more than 
two variables are gathered and forms of multnariate analysis are required. 
Two forms of correlational analysis which may be applied to mull i variate 
data are part ml and multiple correlation. Partial correlation deals wit h 
the residual relationship between two variable's where the common in flu 
dice of one or more other variables has been removed. Multiple correla- 
tion deals with the calculation of weights which produce The maximum 
possible correlation between a criterion variable* and the weighted sum of 
two or more predictor variables Its purpose is to maximize the effi- 
ciency of prediction. Other forms of nmltivarrilc analysis exist, but 
these are beyond the scope of the present element aiy discussion. 


24.2 

Partial correlation 

Let us assume that a test of intelligence and a test of psychomotor ability 
have been administered to a group of children showing considerable 
variation in age. Both intelligence and psychomotor ability increase 
w r ith age. Ten-year-old children are on the average more intelligent than 
six-year-old children. They also have more highly developed psycho- 
motor abilities. Scores on the two tests will correlate with each other 
because both are correlated with age. Partial correlation may be used 
with such data to obtain a measure of correlation with the effect of age 
eliminated or removed. 

What is meant by eliminating , or removing , the effect of a third 
variable These terms in the present context have a precise statistical 
meaning. Let X 1, A’2, and .V 3 be three variables. All or part of the 
correlation between Xi and A' 2 may result because both are correlated 
with X3. The reader will recall from previous discussion on correlation 



sec. 24.2 


Partial correlation 


389 


that a score on Ai may he divided into two parts One part is a score 
predicted from Xi The other part is the residual, or error of estimate, 
in predicting A"i from A', These two parts are independent, or un cor- 
related Similarly, a score on A 2 may he divided into two parts, a part 
predictable from A t and a residual, or error of estimate, m predicting 
A 2 from A 3 The correlation between the two hets of residuals, or errors 
of estimate, in predicting Xi from A r t and A 2 fiom A * is the partial cor- 
relation coefficient. Tt is the pait of the correlation which remains when 
the effect of the third variable is eliminated, or removed 

The formula for calculating the paitial correlation coefficient to 
eliminate a third variable is 


7*12 - r u r^ 

7*12 3 “ , 24.1 

\/(l — ria*)(l ~ r t , 2 ) 

The notation r J2 { means the correlation between residuals when V { has 
boon removed from both A'j and A> This is sometimes called a first- 
order paitial correlation coefficient 

Let A”i and X ■> be scores on an m1elbgem*e and a psychomotor test 
for a group of school children Let A" { be age Let the correlation 
between the three variables be as follows: m> AA. /h - .♦>(), and 
)'z\ -AO. The partial correlation coefficient is 

.m - .00 X .AO 

ri2 i - =■ .Ab 

V(1 - .00*)(1 - .AO 2 ) 

Using a variance interpretation, the proportion overlap bet w ten A^i and 
A 2 is r 12 2 = AA 2 - .303. The proportion overlap with Y { eliminated 
is r V2 ,, 2 = .3(r - .127 r Fhe proportion overlap which results from the 
effects of age is 303 -- .127 - . 1 7(> It would also be appropriate to 
state that the percentage of the total association present resulting from 
the effect of age is ( 170 .303)100 — A8 per cent The remaining 12 per 
cent of the association results from other factors 

Partial correlation may be used to remove the effect of more than one 
variable. The partial correlation hot wo* n A 1 and A r 2 with the effects of 
both X t and A r 4 removed is 


t 12 4 — r 13 4 r 2 3 4 

V(l - r„ 4 s ) (1 ~ r 2 J . 4 2 ) 


24.2 


This is a second-order partial correlation coefficient. Because of diffi- 
culties of interpretation, partial correlation coefficients involving the 
elimination of more than one variable are infrequently calculated. 

A t test may be used to test whether a partial correlation coefficient is 



390 


Partial and multiple correlation 


chap. 24 


significantly different from zero. The required t is 
^ r 12 3 

V(1 - ri 2 y)/(F : "1) 

This may be referred to a table of i with V — 3 degrees of freedom. 


24-3 

Multiple regression and correlation 

The correlation coefficient may be used to predict or estimate a score on 
an unknown variable from knowledge of a score on a known variable. 
The regression equation in standard- score form is 
/ 

Z\ — 7*1222 

where z[ is a predicted or estimated standard score. In this situation we 
have one dependent and one independent variable. If z 2 = 1.2 and 
7*12 = .80, the best estimate of an individual's standard score on variable 1 
is 

z[ = .80 X 1,2 = .00 

The estimate is that the individual is .96 standard deviation unils above 
the average. 

We may consider a situation where we have one dependent and two 
independent variables. The dependent variable may be a measure of 
scholastic success. The independent variables may be two psychological 
tests used at university entrance. The dependent variable is spoken of 
as the criterion. The two independent variables are predictors. How 
may scores on the two predictors be combined to predict scholastic 
success? The correlation between the three variables may be arranged 
in a small table. Let these correlations be as follows: 


1 

2 
3 

Variable 1 is the criterion, and variables 2 and 3 are the piedictors. Note 
that 1.0's have been entered along the main diagonal. I11 estimating 
standard scores on 1 from standard scores on 2 and 3 separately, the two 
regression equations are z[ = .822 and z[ = .3 z 3 . Variable 2 is a much 
better predictor than variable 3. Presumably, by employing a knowledge 
of both 2 and 3, a better estimate of the criterion may be obtained. 





sec. 24.3 


Multiple regression and correlation 


391 


Consider the straight sum of standard scores on 2 and 3. If the sums 
of the values in the four quadrants of the correlation table are repre- 
sented by 



the correlation between a standard score on 1 with the sum of standard 
scores on 2 and 3 is given by 

C 

VJb 

In our example this becomes 

r _ -8 + .3 _ U _ 

— / — — — , — .UoO 

Vl.O + 1.0 + .5 + .5 Vs 

If we express variables 2 and 3 in standard measure, add them together, 
and correlate the sum with standard scores on the criterion, the correla- 
tion will be .635. This is not as good as the prediction obtained with 
variable 2 taken alone. The straight sum of standard scores assigns 
equal weight to the two variables. When variables are added together 
directly, they are weighted in a manner proportional to their standard 
deviations. The standard deviation of standard scores is 1. Conse- 
quently, on adding together standard scores, the variables are equally 
weighted. 

Let us select some arbitrary set of weights and observe the result. 
Let us assign weights of 4 and 1 to the two predictors Thus one pre- 
dictor will receive four times the weight of the other Write these 
weights along the top and to the side of the correlation table as follows: 


4 
1 

We now multiply the rows and columns by these weights to obtain 








392 


Partial and multiple correlation 


chap. 24 


The correlation of the criterion with the sum 4 z 2 + z$ is again given by 
C/VAB and is: 

*•.,( 4 .,+.,) > ■ 32 + 3 = .765 

\/l6.0 + 1.0 + 2.0 + 2.0 

This particular arrangement of weights, 4 and 1 , results in a correlation 
which is substantially better than that obtained with equal weights. 
Obviously, these are not the best possible weights. The correlation of 
the weighted sum with the criterion is less than that obtained with vari- 
able 2 taken separately. 

How may a set of weights be obtained which will maximize the cor- 
relation between the criterion and the sum of scores on the dependent 
variables? Let us represent weights by the symbols 0 2 and 0 3 . An 
estimated standard score on 1 is then given by 

z\ = P 2 Z 2 + 03^3 

We wish to calculate weights 0 2 and 0* such that the correlation between 
Z\ and z\ is a maximum. Mathematically, the problem reduces to the 
calculation of weights which will minimize the average sum of squares of 
differences between the criterion score z 1 and the estimated criterion 
score z[. We require values of 0 2 and 0 3 such that 

^ (zi — z [ ) 2 = a minimum 


The values of 0 2 and 0 3 are multiple regression weights for standard 
scores. They are sometimes called beta coefficients. 

With three variables the values of 0 2 and 0 3 are given by 


r% 2 — r\iT2i 
P2 | ~ 2 

1 — r 2s 

* r r3 — ri 2 r 2 3 

^ - -I 


In the above example 


0* 

0i 


.8 - .3 X .5 
1 - .5* 

.3 - .8 X .5 
1 - .5* 


.867 


-.133 


24.4 

24.5 


Let us write these weights above and to the side of the correlation table 



sec. 24.4 


The regression equation for raw scores 


393 


and multiply the rows and columns as follows: 

8G7 - 133 



1 0 

8 

3 


1 000 

694 


867 

8 

1 0 

5 


694 

752 


133 

3 

5 

1 0 


- 040 

058 

01 s 


The sums of the elements in the four quadrants are 
1.000 | .054 

.054 j .054 

The correlation between the criterion and the weighted sum is 

C/VAR = .054/ V. 05 4 \/ 054 -= .809 

This is a multiple correlation coefficient and may he denoted hv R. Xo 
other system of weights will yield a higher correlation between the cri- 
terion and the weighted sum of predictors. 

Note that the sum of elements m the top right quadrant of the 
weighted correlation table is equal to the sum in t be lower right, 02 (' - li. 
This circumstance will occur if the weights used are multiple regression 
weights. It provides a check on the calculation. We note also that 
R 2 - C and R - \/( r - Thus the multiple correlation coefficient may be 
obtained by the formula 

R = \Z$'iT\2 + /V 13 2 4 6 

This is the commonly used formula for calculating a multiple correlation 
coefficient. 

In our example the multiple correlation is .809. The correlation of 
variable 2 with the criterion is .8. The addition of the third variable 
increases prediction very slightly. In a practical situation the third vari- 
able could safely be discarded as contributing a negligible amount to the 
efficacy of prediction. 

24.4 

The regression equation for raw scores 

The equation z[ = faz^ + (hz* is a regression equation in standard-score 
form. It will yield the best possible linear prediction of a standard score 
on 1 from standard scores qn 2 and 3. In practice, we usually require a 
regression equation for predicting a raw score on 1 from a raw score on 





394 


Partial and multiple correlation 


chap. 24 


2 and 3. Let X[ be a predicted raw score on 1 , and X 2 and X 3 the 
obtained raw score on 2 and 3. The estimated standard score z[ and the 
observed standard scores z 2 and z 3 may be written as 

X[ - X, 

Si 

X 2 - X 2 
s 2 

X3-X3 

S3 


z'i = 
z 2 = 

2 * = 


By substituting these values in the regression equation in standard-score 
form we obtain 


X[- Xi a X, - X t , 0 X , - A'j 

= Pi — rPa-- — — 

Si S 2 S3 


Rearranging terms and writing the expression explicit for X( yields 

X[ = fa - Xt + fa T + ( x, - fa Sx - A' 2 - fa S - x X 3 

S 2 S3 y s 2 • S3 

24.7 

This is a regression equation in raw-score form. It may be used to pre- 
dict a raw score on 1 from a raw score on 2 and 3. The values fi 2 8i/s 2 
and 0aSi/s 3 act as weights. The quantity to the right in parentheses is a 
constant. 

In the example of the previous section fl 2 — .867 and /3 S = —.133. 
Let us assume that si = 5, s 2 = 10, s 3 — 20; also X\ = 20, X 2 = 40, and 
Jc 3 — 60. The regression equation in raw-score form is written as 

X[ - (.867)^2 + (-. 133)*X, 

+ [20 - (.867)*(40) - (-.133)^(60)] 
= .434X2 - .033X3 + 4.62 


24-5 

The geometry of multiple regression 

Given two variables Xi and X 2 , each pair of observation!. may be plotted 
as a point on a plane. If interest resides in predicting one variable from 
a knowledge of another, a straight regression line may be fitted to the 
points and this line used for prediction purposes. 

Given three variables Xi, X 2 . and X 3 , each triplet of observations 
may be plotted as a point in a space of three dimensions as shown in 



sec. 24.5 


The geometry of multiple regression 


395 



fig. 24.1 Crcomrtriral representation of multiple regres- 
sion. A BCD is a multiple regression plane. 


Fig. 24.1. Instead of two axes at right angles to each other, we now 
have three. All triplets of observations n»ay be plotted as points. If 
the correlations between the three variables are all positive, the assembly 
of points will show some tendency to cluster along the diagonal of the 
space of three dimensions extending from the origin 0 to V. A plane 
may be fitted to the assembly of points. With two variables a regres- 
sion line is fitted to points in a two-dimensional space. With three vari- 
ables a regression plane is fitted to points in a three-dimensional space. 
I 11 Fig. 24.1 this plane is represented by AB('D. With two variables the 
regression equation is the equation for a straight line and is of the H r pe 
X[ = b 2 X 2 + a, where b 2 is the slope of the line and a is the point where 
the line intercepts the X Y axis. Wit h three variables the regression equa- 
tion is the equation for a plane and is of the type X x - b 2 X 2 + biX 2 + a. 
Here b 2 is the slope of the line AD in Fig. 24.1 and b 2 is the slope of the 
line AB . The constant a is the point where the plane intercepts the 
Xi axis. In Fig. 24.1 it is I he distance AO. 

Consider now a particular individual. Represent his score ou X 2 by 
OE and on X 3 by OF. We locate the point G in the plane of X 2 and X 3 
and proceed upward until we reach the point H in the regression plane 
ABCD . The distance GH is the best estimate of the individual's score on 
X\ given his scores on A ’ 2 and X 3 . It is the best estimate in the sense 




396 


Partial and multiple correlation 


chap. 24 


that the regression plane is so located as to minimize the sums of squares 
of deviations from it parallel to the Xi axis. 

The reader will observe that the three-variable case is a simple exten- 
sion of the two-variable case A plane is used instead of a straight line. 
With four or more variables the idea is essentially the same. With four 
variables, in effect, we plot points in a space of four dimensions and fit a 
three-dimensional hyperplane to these points By increasing the number 
of variables we may complicate the arithmetic. We do not complicate 
the idea. 


24.6 

More than three variables 

In the discussion above ue have considered the multiple regression case 
with three variables only, one criterion and two predictors With k vari- 
ables the multiple regression equation m standard-score form is 

Z\ — 02?2 + 03^3 + * ' + &kZk 24*8 

The raw-score form of this equation may be obtained, as previously, by 
substituting for the values of z x the values (A', - -Y ,)/«, and 1 earranging 
terms. We thereby obtain 

X[ - 0 2 *' X , + 0, A'l + + (3* 81 X k + A 74.9 

S? S3 Sk. 

where A is given by 

A = X , - 02 - l J? t - 0 , - 1 X , - 0* *- .V* 24.10 

* 2 83 Sk 

The multiple correlation coefficient is given by 

fi = >/0zri2 + 037*13 + • + 0 * 7 * 1 * 24,11 

Thus to calculate this coefficient we multiply each correlation of a pre- 
dictor with the criterion by its corresponding regression coefficient, sum 
these products, and take the square root. 

A number of computational procedures exist for calculating the 
required regression weights with more than three variables. A widely 
used method is the Doolittle method. The method described here origi- 
nates with Aitken (1937) and has been called the method of pivotal con- 
densation. It is described in detail in Thomson (1951), 



sec. 24.7 


Aitken's numerical solution 


397 


24.7 

Aitken’s numerical solution 

To illustrate the application of Aitken’s method let us consider a problem 
with five variables, one criterion and four predictors. Denote the cri- 
terion by Xi and the predictors by A\>, AY AY and AY The criterion 
may be regarded as a measure of success in an occupation, and the pre- 
dictors may be psychological tests used to predict performance in the 
occupation. 

The intercorrelations between the five variables are shown in Table 
24.1 . The means and standard deviations of t he five variables are shown 
in Table 21.2. Table 21.3 shows the procedure for calculating the multi- 
ple regression weights. This procedure lequires the succeisive calcu- 
lation of differences between cross products If the four cell values are 

a b 
c d 


Table 24.1 

Correlation coefficients between a criterion and four predictors 



x> 

•Vs 

X\ 

A\ 

Ad 

*1 

1 00 

7 2 

f»X 

H 

03 

x 2 

72 

1 on 

00 

40 

30 

A'* 

5X 

oo 

1 on 

3x 

19 

X, 

11 

40 

.{X 

1 on 

27 

Xs 

03 

39 

10 

27 

1 00 


Table 24.2 

Means and standard deviations for criterion and four predictors 



X 

<5 

Xi 

8 72 

.5 ( X 

Vs 

104 0,5 

lfj 71 

V, 

43 22 

9 92 

x< 

14 98 

6 32 

V. 

87 22 

14 09 




398 


Partial and multiple correlation 


chap. 24 


the difference between cross products is 

ad — cb 

In this case the cell value a is the pivotal element. 

The steps in the calculation are as follows: 

1 Write down the matrix of intercorrelations between 
the predictors, that is, between variables X 2 , X 3 , X A , 
and X$. Insert Ts along the diagonal. Beneath this 
matrix write a row containing the correlations of the 
predictors with the criterion. The resulting matrix is 
shown to the left of slab A in Table 24.3. 

2 To the right of the above matrix record another matrix 
with — l’s down the diagonal. All other elements are 
zero, including those in the bottom row. In Table 24.3 
a dot represents a zero. 

3 Sum the rows to obtain the values in the check column. 


Table 24.3 

Aitken’s method for computing regression coefffcients* 


Check 



o) 

.69 

.49 

.39 

-1 



1.57 


.69 

1 

.38 

19 


-1 


1.26 

A 

.49 

.38 

l 

.27 



-1 

1.14 


.39 

.19 

.27 

1 



-1 

.85 


.72 

.58 

.41 

63 




2.34 


(1.908) 

(.524) 

.042 

- 079 

090 

-1 


.177 


I 

1.000 

.080 

-.151 

1 317 

-1 908 


.338 


1 

.042 

.760 

079 

.490 


-1 

.371 

H 


-.079 

.079 

848 

.390 


-1 

.238 


i 

.083 

.057 

.349 

.720 



1 .210 


(1.321) 


(.757) 

*085 ~ 

. 435 

7080~ 

-1 

" . 357 




1.000 

.112 

.575 

’ .106 

-1.321 

.472 

C 



.085 

.836 

.494 

-.151 

-1 

.265 




.050 

. 362 

.611 

.158 


1.182 


(1.211) 



(-826) 

.445 

-.160 

.112 -1 

.225 

D 




1.000 

.539 

-.194 

.136 -1.211 

.272 





.356 

.582 

.153 

. 060 

1.158 

~E 





.390 

.222 

.018 .431 

1.061 







Regression coefficients 




sec. 24.7 


Aitken’8 numerical solution 


399 


4 Calculate the differences between cross products for 
the first two rows of slab A , using the 1 in the top left 
cell as the pivotal element. Thus the following prod- 
uct differences are formed: 

1 X 1 - .69 X .69 = .524 
1 X .38 - .49 X .69 = .042 

1 X .19 - .39 X .69 = 079 

1 X 0 - (-1) X .69 = .690 
1 X (-1) - 0 X .69 = -1 
1 X 0 - 0 X .69 = 0 

1 X 0 - 0 X .69 « 0 

These values are recorded in the first row of slab B. 
The check value is obtained bv forming the product 
difference 

1 X 1.26 - 1.57 X 69 = .177 

If the calculation is correct to this point , the sum of ele- 
ments in the first row of slab B will equal the product 
difference .177. 

5 Beneath the first row of slab B write a second version 
of it obtained by dividing each element by the top left 
element, .524. The result is a row with unity as the 
pivot. This assists subsequent calculation. This 
part of the procedure is most readily accomplished by 
multiplying the elements 111 the row by the reciprocal 
of .524, or by 1.908. 

6 The remaining elements 111 slab B are obtained by 
forming product differences u>ing the first row of slab A 
with the third, fourth, and fifth rows of slab A , suc- 
cessively, alw r ays using the 1 in the top left cell as the 
pivotal element. Thus 

1 X .38 - .69 X .49 = .012 
1 X 1 - .49 X .49 = .760 

and so on. Each row is summed to provide a check on 
the calculation. The resul 1 is a reduction cf the 
original 5 X 4 matrix of slab A to the 4X3 matrix of 
slab B. 

7 The procedure is now repeated to obtain slabs C, D , 
and E . At each stage, with the exception of the last, 
the top row in each slab is divided by the left-hand cell 



400 


Partial and multiple correlation 


chap. 24 


value, or multiplied by the reciprocal of that value, to 
obtain a second version of the top row. The appropri- 
ate reciprocal for row C is 1.321, and for row /) it is 
1 . 211 . 

8 By proceeding with the calculation, the original matrix 
is condensed to the cell values in slab E. These four 
values are the multiple regression coefficients for pre- 
dicting a standard score on the criterion from standard 
scores on the four predictors. 

In this example the regression equation for predicting the criterion 
from the predictors in standard-score form is 

z[ =■ .390*2 + .222*3 + .018* 4 + .131* 5 

No other system of weights will provide a better estimate of the criterion. 
The correlations of the four predictors with the criterion are 

.72 .58 .41 .63 


By multiplying these by the corresponding regression coefficients, 
summing the resulting products, and taking the square nfbt, we obtain 
the multiple correlation coefficient as follows* 

R = V.390 X .72 + .222 X _ .38 + .018 X .41 + 4.11 X .63 - .83 


A multiple correlation coefficient is amenable to the same general type of 
interpretation as any other correlation coefficient. It is the correlation 
between a criterion variable and the weighted sum of the predictors, the 
predictors being weighted in order to maximize that correlation 

To obtain a multiple regression equation in raw-scon 1 form we 
require the means and standard deviations of Table 21.2. We may w rite 


x\ = (.390) X, + (.222) 


+ ( 018) .Y 4 + (.431) x 6 + .1 


The constant A is given by 

A -= 8.72 - (.390) (104.631 ~ (-222) (43.22) 

c fiQ r: p.Q 

- (.018) (14.98) - (.431) (87.22) = -26.81 


With any substantial number of variables the calculation of multiple 
regression weights is clearly a laborious procedure and requires the use of 
modern computing devices. 



sec. 24.9 


Some observations on multiple correlation 


401 


24.8 

The significance of a multiple 
correlation coefficient 


An F ratio may he used to test whether an observed multiple eorrelation 
coefficient is significantly different from zero. The required value of /^is 
given by the formula 


R 2 _ N — k — \ 
* f- R 2 k 


24.12 


where R = multiple correlation coefficient 
N = number of observations 
k = number of independent variables or pre- 
dictors 

The table of F is entered with df\ = k and dj 2 - N — k - l. 


24.9 

Some observations on multiple correlation 

The techniques of multiple correlation have practical application in occu- 
pational and scholastic selection where it becomes necessary to combine 
a number of variables to provide the best possible estimate of a criterion 
measure. An appreciation of the relative contributions of the indepen- 
dent variables in predicting the criterion is not readily grasped by simple 
inspection of the multiple regression coefficients. With two predictors 
the square of the multiple correlation coefficient may be shown equal to 

R 2 - W -f W + 2 

Thus the predicted variance is comprised of three additive parts. fi 2 2 
represents a con intuition by A%. a contribution by A” $, and the term 
20203^23 is a component which involves the correlation between A" 2 and 
AY Thus the evaluation of the relative contributions of the different 
variables is not a simple matter of direct comparison of the relative mag- 
nitudes of the regression coefficients but requires also a consideration of 
the correlation terms. 

Frequently, in practical work, the greater part of the prediction 
achieved can be attributed to a relatively small number of variables, 
perhaps four or five or six, and the inclusion of additional variables con- 
tributes only small and diminishing amounts to prediction. Tests of 
significance may be applied to decide whether or not the addition of one 
or more variables to a subset of variables will significantly improve 
prediction. 

Investigators concerned with problems of prediction frequently 



402 


Partial and multiple correlation 


chap. 24 


attempt to identify independent variables which show a high correlation 
with the criterion and a low correlation with each other. If two variables 
have a fairly high correlation with the criterion and a low correlation with 
each other, both measure different aspects of the criterion and both will 
contribute substantially to prediction. If two variables have a high 
correlation with each other, they are measures of much the same thing, 
and the inclusion of both, instead of either one or the other, will contribute 
little to the prediction achieved. 

EXERCISES 

1 Given the correlations ri2 - .70, r n = ,50, andr 2 3 = .60, compute r u 3 . 
What percentage of the association between variables 1 and 2 results 
because of the effect of variable 3? 

2 The mean and standard deviation of a criterion variable are X\ = 24.56 
and Si = 4.52. The means and standard deviations for two predictor 
variables are Xi = 36.48, X A - 16.95, and s 2 = 5.49, s-i = 3.66. The 
correlations are ri 2 -= .70, r u = .65, and r 21 = .33. Compute (a) the 
correlation between standard scores on the criteiion aftd the sum of 
standard scores on the two predictors, ( b ) the correlation between raw 
scores on the criterion and the sum of raw scores on the two predictors, 
(c) the multiple regression equation in standard-score form, ( d ) the 
multiple regression equation in raw-.^ore form, (e) the multiple correla- 
tion coefficient. 

3 The following are intercorrolations between first-year university 
averages and five university entrance examinations. Means and 
standard deviations are also given : 



X, 

X, 

A'i 

X* 

X. 

X, 

x. 

8 t 

X, 

1 00 

62 

.55 

43 

38 

09 

72 61 

6.56 

x s 

62 

1 00 

72 

55 

43 

49 

62 50 

4 28 

X, 

55 

72 

1 00 

55 

36 

47 

58 65 

5.33 

X, 

43 

55 

55 

1 00 

.65 

.50 

65 80 

5 77 

X. 

38 

43 

.36 

65 

1 00 

20 

69 75 

3.91 

X. 

09 

.49 

47 

50 

20 

1 CO 

71 80 

4 45 


Compute (a) the multiple regression equation in standard-score foon, 
(b) the multiple regression equation in raw-score form, (c) the multiple 
correlation coefficient, (d) the multiple correlation coefficients obtained 
by successively dropping variables 6, 5, and 4. 



Appendix 


TABLES 


A. Ordinates and Areas of the Normal Curve . ... 404 

B. Critical Values of t 406 

C. Critical Values of Chi Square 407 

D. Critical Values of F 408 

E. Transformation of r to 412 

F. Critical Values of the Correlation Coefficient . . 4 1 

G. Critical Values of p, the Spearman Rank Cor- 
relation Poefficien t . 414 

H. Probabilities Associated with Values as 1 .arge as 

Observed Values of S in the Tvendall Rank Cor- 
relation Coefficient 415 

I. Critical Values of T in the Wilcox.on Matched- 

pairs Signed-ranks Test .416 

J. Coefficients of Orthogonal Polynomials . 417 

K. Squares and Square Roots of Numbers from 1 to 

1,000 . . 418 



404 


Appendix 


Table A 

Ordinates and areas of the normal curve* 

(In forms of c units) 


X 

0 

Art a 

: i 

Ordinate 

t 

x 1 

a | 

Ar^ a 

OrdmaU 

X 

a 

Area 

Ordinate 

00 

0000 

3989 

50 ' 

1 

352 1 

1 00 

341 3 

2120 

01 

0040 

3989 

51 

1950 

350 3 

1 01 

3438 

2376 

02 

0080 

3989 

52 

1985 

34S5 

1 02 

3461 

23/1 

0) 

0120 

3988 

53 

7019 

3467 

1 03 | 

i 3485 

2347 

04 

0160 

3986 

54 

2054 

3418 

1 04 

3508 

1 

2323 

OS 

0199 

3984 

>>5 

2088 

sno 

l 05 

3531 

2299 

06 

02 39 

3982 

56 

2123 

3410 

1 06 

3554 

2275 

07 

0279 

3980 

57 

2157 

3391 

1 07 

j 3577 

>251 

08 

OS 1 9 

3977 

58 

2 1 90 

3372 

1 08 

| 3 59 J 

222/ 

06 

OS 59 

3973 

59 

2221 

3 352 

1 09 

36 > 1 

2203 

10 

0398 

3970 

60 

'>57 j 

| 3332 ' 

1 1 10 


2179 

11 

04 38 

3965 

61 

-4 91 

3312 | 

1 1 11 

i 

1 3665 

2155 

12 

0178 

39fl 

62 

M2 4 

3 '92 

1 12 

3(>86 

21 31 

1 1 

051 / 

39,6 

6 3 

> 3 57 1 

1 32/1 I 

1 13 

3 08 

2107 

14 

0557 

3 J51 

64 

2 387 

j 3>51 1 

1 14 

1 3 / >7 

2083 

15 

059 i 

39 15 

65 

| > 1 >2 

37 30 

1 15 

3 49 

: 2059 

16 

06 36 

39 39 

66 

’hi 

3 >09 

1 16 

3770 

| 2036 

17 

067 5 

39 32 

' 7 

| >186 

3187 

1 I / 

1 s m 

1 2012 

18 

0711 

39 >5 

£8 

1 251 , 

3 1 66 

| 1 18 

1 3810 

1 1989 

16 

07 5 3 

3918 

69 

| 2' > 1 > 

S 1 14 | 

1 19 

48 30 

1965 

20 

079 3 

3910 

0 

MSO 

31 M 

1 

1 20 

1 

3819 

1912 

21 

08 32 

3>(>> 

/I 

7611 

1 3HM | 

1 >1 

3S69 

19|9 

22 

0871 

3894 

72 

*.64 > 

3079 | 

1 >2 

S88« , 

1S9S 

23 

0910 

3885 

73 

1 -3, 3 

>056 | 

1 >J 

3907 

1872 

21 

0948 

3876 

74 

1 > 03 

1 3034 

1 1 24 

3925 

1849 

25 

0987 

3867 

75 

2" 3 l 

3011 1 

1 75 

1 3944 

I 1826 

26 

1026 

385- 

76 

2764 

2989 

l 26 | 

1 3962 

1804 

27 

1064 

3847 

n 

2/94 

2966 

1 27 

3980 

1 1781 

28 

nos 

3836 

78 

782 3 

2943 

i 28 

399/ 

1758 

29 

1141 

3825 

79 

2852 

fc 9 20 

1 29 

1015 

1736 

SO 

1179 

3814 

80 

2881 

2897 

l 30 

4032 

1714 

SI 

P17 

3802 

81 

2910 

2874 

1 31 i 

| 4049 

1691 

S2 

1255 

3790 

82 

29 37 

2850 

1 32 

4066 

1669 

S3 

1293 

3//8 

83 

2967 

2827 

1 33 

4082 

1647 

34 | 

1331 

3765 

84 

2995 

2803 

1 34 1 

4099 

1626 

ss 

1 368 

3752 

85 

3023 

2780 

1 35 

4115 

1604 

36 

1406 

3/39 

86 

3051 

2756 

1 36 

4131 

1582 

37 

1443 

3725 

87 

30/8 

2732 

1 37 

4147 

1561 

38 

1480 

3712 

88 

3106 

2709 

1 38 

4162 

1539 

39 

151/ 

3697 

89 

31 33 

2685 

1 39 

4177 

1518 

40 

1551 

368 3 

90 

3 J 59 

2661 

1 40 

4192 

1497 

41 

1591 

3668 

91 

3186 

2637 | 

1 11 

4207 

1476 

42 

1628 

3653 

92 

3212 

2613 

1 42 

4222 

1456 

43 

1664 

3637 

93 

3238 

2 589 1 

1 43 

4236 

1435 

44 

1700 

3621 

94 

3264 

2565 | 

1 44 

4251 

1415 

45 

1/36 

3605 

95 

3289 

2541 

1 45 

4265 

1394 

46 

1772 

3589 

96 

3315 

2516 

1 46 

4279 

1374 

47 

, 1808 

3572 

97 

3340 

2492 

1 47 

4292 

1354 

48 

1844 

3555 

98 

3365 

2468 

I 48 

4J06 

1334 

49 

1 18?g 1 

3538 

99 

3389 

2444 

1 49 

4319 

1315 

50 

1915 

3521 

1 00 

3413 

2420 

1 50 

4332 

1295 




Appendix 


405 


Table A (continued; 


X 

9 

Area 

Ordinate 

* 

1 1 

Area 

Ordmate 

■ 

Area 

Ordinate 

1 50 

4332 

1295 

2 00 

4772 

0540 

2 50 

4938 

0175 

1 51 

4345 

1276 

2 01 

4778 

0529 

2 51 

4940 

0171 

1 52 

4357 

1257 

2 02 

1783 

0519 

2 52 

4941 ! 

0167 

1 53 

4370 

1238 

2 01 

4788 

0508 

2 53 

4943 

0163 

1 54 

4382 

1219 

2 04 

4793 

0498 

2 54 

4945 

0158 

1 55 

4194 

1200 

2 05 

4798 

0488 

2 55 

4946 

0154 

1 56 

4406 

1182 

2 06 

4803 

0478 

2 56 

4048 

0151 

1 57 

4418 

1163 

2 07 

4808 

0468 

2 57 

4049 

0147 

1 58 

4429 

1145 

2 08 

4812 

0459 

? 58 

4951 

0143 

1 59 

4441 

1127 

? 09 

4817 

0419 

2 59 

4952 

0139 

1 60 

4452 

1109 

2 10 

48M 

0440 

2 60 

49*3 

0136 

1 61 

4463 

1002 

2 11 

4826 

0431 

’ 61 

4955 

0132 

1 62 

44/4 

1()' T 4 

7 12 

4810 

0427 

2 62 

4956 

0129 

1 61 

4484 

1057 

2 11 

4834 

011 3 

? 63 

4957 

0126 

1 61 

| 

4495 

I 

1040 

2 11 

4838 

0404 

2 64 

4959 

0122 

I 65 

4505 

1023 

2 15 

484 2 

0195 

2 65 

4960 

0119 

1 66 

4515 

1006 

2 16 

4846 

0387 

2 66 

4961 

0116 

1 67 

4525 

0089 

2 17 

4850 

0179 

2 67 | 

| 496? 

0113 

1 68 

4535 

097 3 

2 18 

4851 

0171 

2 68 

4>63 

0110 

1 69 

4545 

0957 

2 19 

4S"*7 

0361 

2 67 

4961 

0107 

1 70 

1 4554 

0940 

7 70 

1 4861 

0155 , 

, 2 70 

1 4965 

0104 

1 71 

1 1561 

0 >2 5 

2 

»86i 

0147 

L 2 71 

4 >06 

01"! 

1 72 

45/1 

090 > 

2 27 

4868 

0339 

2 72 

4 >67 

0099 

1 71 

4582 

0893 

2 21 

4871 

0332 | 

I 2 /I 

4968 

0096 

l 71 

4 591 

0878 

? 24 

4875 

0P5 | 

1 2 74 

4769 

0093 

1 75 

1599 

0861 

7 25 

48" T 8 

0117 1 

2 75 ! 

4970 

0091 

I 76 

4(08 

0848 

2 26 

4881 

0110 1 

I 2 76 

4971 

I 0088 

1 77 

4616 

0833 

2 27 

4881 

0101 

2 n 

\)’2 

1 0086 

1 78 

4625 

0818 

2 28 

4S87 

0“>97 

2 78 

497 3 

0084 

1 79 

4611 

0801 

2 29 

4890 

0290 

2 79 

4971 

0081 

1 SO 

I 1611 

0790 

2 30 

4893 

0283 

2 80 

4974 

0079 

1 81 

1 46 19 

0 75 

2 31 

4896 

0277 

2 81 

4975 

0077 

1 82 | 

| 46 >6 

0761 

2 37 

4898 

0^70 

2 82 

4976 

0075 

1 81 

16(1 

0748 

> 33 

4901 

0264 

2 83 

4977 

0073 

1 84 | 

| 1671 

0714 

1 

2 34 

4901 

0258 

2 84 

4977 

0071 

1 85 

4678 1 

1 0721 

2 35 

4906 

0/52 

2 85 

4978 

0069 

1 86 

4686 

0707 

2 36 

4909 

0246 

2 86 

4979 

0067 

1 87 

469 3 

0694 

2 37 

4911 

0241 

2 87 

4979 

0065 

1 88 

4699 

0 >81 

2 38 

4913 

0215 

2 88 

4980 

0061 

1 89 

4706 

0669 

2 39 

4916 

0229 

2 89 

4981 

0061 

1 90 

4713 

0656 

2 40 

4918 

0224 

2 90 

4981 

0060 

1 91 

4719 

0644 

2 4! 

4970 

0219 

7 91 

40S2 

0058 

1 92 

4726 

0632 

2 42 

49z2 

0213 

2 92 I 

4982 

0056 

1 93 

4732 

0620 

2 43 

492 5 

0208 

2 91 

4983 

0055 

l 94 

4738 

0608 

2 44 

4927 

0203 

2 94 1 

4984 

0053 

1 95 

4744 

0596 

2 45 

4929 

0198 

2 9 r 

4984 

0051 

1 96 

4750 

0584 

2 46 

4931 

0194 

2 96 

4985 

0050 

1 97 

4756 

0573 

2 47 

4932 

0189 

2 97 

4985 

0048 

1 98 

4761 

0562 

2 48 

4934 

0184 

2 98 

4986 

0047 

1 99 

4767 

0551 

2 49 

4936 

0180 

2 99 

4986 

0046 

2 00 

4772 

0540 

2 50 

4938 

0175 

3 00 

4987 

0044 








406 


Appendix 


Table B 

Critical values of t * 



j 

Level of significance for one-tailed test 

df 

.10 

.05 

.025 

.01 

.005 

.0005 









Level of significance for two-tailed test 



.20 

.10 

.05 

.02 

.01 

.001 

1 

3.078 

6.314 

12.706 

31.821 

63.657 

636.619 

2 

1.886 

2.920 

4.303 

6.965 

9.925 

31 598 

3 

1.638 

2.353 

3.182 

4.541 

5.841 

12.941 

4 

1.533 

2.132 

2 776 

3.747 

4 604 

8 610 

6 

1.476 

2.015 

2.571 

3.365 

4.032 

6 859 

6 

1.440 

1.943 

2.447 

•3.143 

3.707 

5 959 

7 

1.415 

1.895 

2.365 

2.998 

3.499 

5 405 

8 

1.397 

1 860 

2 306 

2 896 

3.355 

5 041 

9 

1.383 

1.833 

2.262 

2 821 

3 250 

4 781 

10 

1.372 

1.812 

2.228 

2.764 

3.169 

4.587 

11 

1.363 

1 796 

2.201 

2.718 

3 106 

4 437 

12 

1.356 

1 782 

2 179 

2.681 

3.055 

4 3.8 

13 

1 350 

1.771 

2 160 

2.650 

3 012 

4.221 

14 

1.345 

1 761 

2.145 

2 624 

2.977 

4.149 

15 

1.341 

1.753 

2 131 

2.602 

2.947 

4.073 

16 

1.337 

1 746 

2.120 

2.583 

2.921 

4.015 

17 

1.333 

1.740 

2.110 

2.567 

2 898 

3.965 

18 

1 330 

1 734 

2 101 

2.552 

2 878 

3 922 


1 328 

1.729 

2 093 

2.539 

2 861 

3 883 


1.325 

1.725 

2 086 

2.528 

2.845 

3.850 

21 

1.323 

1.721 

2 080 | 

2.518 

2.831 

3.819 

22 

1.321 

1.717 

2 074 

2.508 

2.819 

3.792 

23 

1.319 

1.714 

2 069 

2.500 

2.807 

3.767 

24 

1.318 

1.711 

2 064 

2.492 

2.797 

3.745 

25 

1.316 

1.708 

2.060 

2.485 

2.787 

3.725 

26 

1.315 

1.706 

2.056 

2.479 

2.779 

3.707 

27 

1.314 

1.703 

2.052 

2.473 

2.771 

3.690 

28 

1.313 

1.701 

2.048 

2.467 

2.763 

3.674 

29 

1.311 

1.699 

2.045 

2.462 

2.756 

3 659 

30 

1.310 

1.697 

2.042 

2.457 

2.750 

3.646 

40 

1.303 

1.684 

2.021 

2.423 

2.704 

3.551 

60 

1.296 

1.671 

2 000 

2.390 

2.660 

3.460 

120 

1.289 

1.658 

1.980 

2.358 

2.617 

3.373 

00 

1.282 

1.645 

1.960 

2.326 

2.576 

3.291 





Appendix 


407 


Table C 

Critical values of chi square* 


Probability under //• that x* > chi square 


.70 .50 .30 .20 .10 .05 .02 .0’ , .001 


1 .00016 .00063 

2 .02 .04 

3 .12 .18 

4 .30 .43 

6 .55 .76 

6 .87 1.13 

7 1.24 1.66 

8 t .65 2.03 

9 2.09 2.53 

10 2.56 3 06 

11 3.05 3.61 

12 3 57 4.18 

13 4 11 4 76 

14 4 60 5.37 

15 5.23 5.98 

16 5.81 6.61 

17 6 41 7 26 

18 7 02 7.91 

19 7 63 8.57 

201 8.26 9.24 


.15 .46 1.07 1.64 2.71| 3 84 5.41 6.64 10.83 
.71 1.39 2.41 3 22, 4.60, 5.99 7.82 9.21 13.82 



1 42 2.37 3.66 4.64 6 25 


84 11.34 16.27 


1.65 2.20 3.36 4.88 5.99 7.78 9.49 11 .67 13 28 18 46 
2.34 3.00 4.35 6.06 7.29 9. 24lll 07 13 .39 15.09 20.52 


3.8315.85 7.23 8 50 10. 64'12. 59 15.03 16.81 22 46 

4 67 ' 6 35 8.38 9 80 12.02(14 07 16.62 18 48 24.32 

5 53 7 34 9 52 11 03 13 36 15 51 18. 17 20.09 20. 12 

6.39 * 31 10.66 12.24 14.68|16 92 1 19. 68 21 .67 27 88 

7.27* 9 34|ll.78 13.44 15. 99118. 3l|21. 1623 21 29.59 


10 34 12 90 14.63 17.28 19.68122 62;24 72 31.26 

11 34 14 01 15.81 18.65 21.03*24.05 26.22 32.91 
12 .34 15 12 16.98 19.81 22 .36 25 . 47 27 .69 34.53 

13 34j 16 22 18.15 21.06 23.68 26.87 20.14 36 12 

14 34 17.32 19.31 22.31 25.0028 26 30 58 37.70 


12 62(15.34 18.42 20 46 23. 54*26. 30(29 33|32 .00 3«.29 
13 . 53J 10 34* 19 51 21.62 24 77(27.59 31 00133. 41140 75 
14 44 17.34(20 60 22 76*25.99 28.87(32 35l31 80(42.31 
15.35 18.34*21.69 23 90*27.20 30.14 33 69|30 19 43.82 
16.27 19.34122 78 25 04 28.41 31.41135 02 37 67 45.32 


21 8.90 

22 9 54 

23 10 20 

24 10.86 

25 11.52 


17.18.20.34 23.86 26 17 29 62(32.67 36 34,38 93 46.80 
18.10 21.24*24.94 27.30 30 81*33.92 37.66 40 29 48.27 
19 . 02 j 22 . 34 ! 26 02 28.43 32 01*35. 17,38 97*41 64 49 73 
19.94 23.34127 1()}29 55 33 20 3l> 42 10,27.42.98 51.18 
20.87 24.34 28 17 30 08 31 38 37.G£>|-ftl .57*44 31 52.62 


26 12.20 
27 12.88 


29 14.26 

30 14.95 


17.29 19.82 21.79 25.34 29.25 31.80 33 66 38.88 42.86,15 64 54.05 
18.11 20.70 22 72 26.34 30 32*32 91 36 74 40 11*41.11(46.96(55.48 

18.94 21.59 23 65(27.34 31 39(34 03 37 92 41 31 45 4248 28 56 89 

19.77 22.48 24 58 28.34 32 46 35.14 39 09 42. 50'40. 69*49.591 58. 30 

20 60 23.36 25.51*29.34 33 53*36 25 40 26 43.77,47 96 50. 89*59. 70 

• l 1 III 



Table D 

Critical values of F* 

5 per cent (roman type) and 1 per cent fbold-face type' points for the distribution of F 


9 

S 

*o 

«c o o 

tr o tr 

~3 * 

it *4 G 

IT 10 IT 

~*3 a- 

| 50 75 100 | 200 

i 

~ «Q O' 

1/ O -f 

”3 2 

^ O O 

«o ao ** 

" S 2 
ro a* ao 

to M «* 

'N 00 

to o 

1 ^ S , '~ 

to O rf 

«*4 n 

! ^ 2 

o 

1 5s - 


^ S o 

o 

j o oo o 

vr» IB 


^ 3 O' 

■a* 

1 O' ^0 to 

't eg if 

M 

| M 3 o 


I S3 3 

S 

^4 0Q 

1 to o 


& | o 


l i 


c i g g S 

hi § 3 

s s a - 


o o» «*• 


to Cl 
-*■ 
Ct <** 

to 


;s *■ 


— M 

?.s 


£3 

31 


S3 S 3 


£3 

QO tO 


O' o> 

— a* 

33 


o« at 
— at 

£3 

2S 


S 3 

23 


23 

O *• 


J3 

3 


23 


■S3 

w n 
w 

<c 


to om 

«• o 

M 

3 S3 

ei os 

2 S 


> «• — 

' m r- «« 

i *o 

• t S 3 


sj 

m m 
w 

- 3 

»o m 

H 
t- •> 
t" at 

« 

ss 


o « 

»- H 

“3 

ao a> 


aO n 
oo I-* 


O QO 

o tp 

w< 

S3 


O M 
CM tO 

O to 


o 8 

to QO 

sS 


23 S3 


o ^o »o o 

^O <-'l •• 


O M — 


) to O Ok -*■ «* to 

i «9 r- tO *T M r~ 


9 QO - l 

U“ ao oo t 


33 


- t* uo Ok 

> « C • 


o e- " 


«*i r- 

OO CQ 

•* © 


^ o 

rH 
to *- 


OO C- i . 4 


wr* O' «4 

O Ok *0 *» 

to © • 

O' Ok *** to 

— < ao to 

•o *4 ■*■ ok 

- « o ao 

•0*0 r- («. 

to #j ^ Ok 

O'" *» M 

NO —Ok 


C H 

N- ^ 

ot tm 


S.3 88 S3 

«- IO »*" to n4 ^t 

<-i to f*“ to o *■« 

*0 QO © © UO to 

r*y to *0 © IN a 

■'f O w Nl to 

*o ot C w x « 

«• 'O «o N o 

OO ao go o CO 

or. G M OO rn 

» *5 ■» to ■'i ta 

— t- Nt CO O QO 

■*r o - oo o *- 

*o tO »0 to N o 

rf to tr> «0 rO © 

•* *■« — n o oo 


S 3 

**1 to 


S3 

rt U> 


T O M' 


' to — to O Q C* 

I ^ •'I «Q l- O v *■* 


S3 S3 

«->■> «» fO O 


£3 BS 

ao ^ a» 

tt IS CO 

N O ^ to 


£ 8 
m 

33 


18 

f QO 

Hf 




Table D (continued) 



. I 

— 9 

N xM 

ss 

p* o 

<C 0 

82 

64 

OS 

PM *M 

'O e 

O 0 px. 0 i/l |« 

00 00 00 

83 

51 

SI 

32 

00 

9 9 



— ft 

— CM 

— CM 

— CM 

— 0 

— CM 

—■ xM p «M — *M 

— m4 

-X 0 

•X 0 

XX 0 


S 

tr 

<n 9 

r» *h 

e 0 

r- 0 

or M 

C *M 

c s 

82 

22 

- 0 O 0 NO O 

O 0 0 0 0 0 

S3 

33 

S3 

32 


— M 

— 0 

— CM 

- CM 

— CM 

— CM 

— *M -< «4 vH 

XX «4 

H 

XX 0 



g 

^ N 
p- CM 

PM 9 

P» xH 

— «0 

P> H 

9 0 
>0 kM 

or e 

O 0 

ss 

S S S3 00 

»» e 
*nm 

«0 0 
io 5 

0 0 
0 9 

S3 


fM 

— ft 

- 0 

— CM 

— CM 

— « 

- CM 

x-> 0 -x «4 — 0 

XX 0 

— 0 

xx 0 

XX 0 



r- m~ 

C 0 

9 0 

"pm « 

— 0 

O 0 

67 

08 

64 

04 

i 

62 

00 

S3 

O' 0 

PS 0 

O 0 


8 I 

t» M 

P* 0 

r- CM 

P H 

0 

'C m* 

>P) 9 

0 9 

•0 3 


— ft 

- cm 

— CM 

— CM 

— 0 

— 0 

-x « "0 — CM 

»< 0 

— 0 

XX 0 

XX 0 


= 

O 0 

ao 0 

— n 

ao CD 

p» 0 

— ct 

<o 0 

p- CM 

- CM 

^ 0 

f' M 

— MM 

rn 0 

p* 0 

— CM 

PM 0 

P *M 

- CM 

O' 0 p» m 0 0 

OH <0 9 0 0 

— 0* px MM — MM 

3 8 

— CM 

ss 

— > 0 

S3 

— 0 

« 0 
0 9 

XX 0 


= 1 

^ e 

00 9 

PM 0 

so 0 

O CM 
oo n 

ao O 
p- 0 

p> t» 

p- 0 

O 0 

0 

xp O — 0 O' 0 

p- 0 Px »H VO xH 

IS. CO 

0 e 

oo 

38 

38 



— ft 

0 CM 

— CM 

- CM 

— CM 

— 0 

— 0 p 0 P MM 

— 0 

-1 CM 

— CM 

0 CM 


O 

£3 

0 «H 
ao 9 

« 3 

— m 

90 0 

O 0 

OO 0 

r^S 

^0 0 *M PM !• 

px fM p* 0 P* w* 

S3 

O' 0 

O 0 

0 0 
0 S 

88 



- 0 

— 0 

— CM 

— CM 

-M 0 

— CM 

— 0 *p 0 p CM 

-X CM 

— «M 

— MM 

0 CM 


o 

22 

gg 

00 9 

00 d 

S3 

tn 0 

0 0 

0 0 

PM 0 o O « • 
0 0 0 0 P> 0 

0 0 

N 0 

S3 

rn 9 

P« 0 

PM 0 
P* 0 


m 

— « 

0 M 

— CM 

- M 

— CM 

-x 0 

— 0 p« p CM 

— CM 

-i CM 

x- MM 

0 MM 


xt 

O 0 

O' 0 

SS 

rn m 

a « 

5 3 

§3 

O' h 

90 0 

<j 0 0 0 PM 0 

0 0 0 0 0 0 

S3 

O 9 

px 0 

28 

53 

1 > 

lx 

PS 

— ft 

— n 

— CM 

— CM 

- CM 

-x 0 * 

p CM p 0 — CM 

-x CM 

— 0 

0 CM 

0 CM 

3 

er 

o 

S? 

O' «o 

O' «A 

9 3 

53 

*r h 

O' 0 

0 

O 0 

p h o r- p -0 

O 0 0 0 0 0 

0 e 

90 0 

0 P* 

0 0 

PM 0 

0 0 

0 0 

c 

PM 

pm CM 

- 0 

— CM 

- CM 

— 0 

— 0 

— CM P0 — 0 

— CM 

x- 0 

0 CM 

0 CM 

V 

0 

O 

O 0 

c m 

C S 

0 0 

c r- 

PI H 

C P- 

S3 

O 0 

O 0 

P* 0 m 0 "0 0 

O' <0 O' 0 O' 0 

PM 0 

O' 0 

O 9 
O 0 

O' 0 

0 0 

0 0 
0 0 

W* 

V 


rsi 0 

pm m 

PM CM 

pm n 

PM 0 

— 0 

p CM pM - CM 

-0 

X-X 0 

0 MM 

0 0 

cd 

V 

0 

- 9 

- m 

28 

S3 

o e 

O 0 

oS 

0 0 

o «*• 

PM O OlO or 0 

O !• O 0 9 0 

0 9 

O 0 

o 3 

0 9 

O' 0 

p| CM 
O' 0 

*30 


PM 0 

pi M 

pm CM 

pm CM 

PM 0 

PM 0 

PM MM PM 0 — 0 

p* CM 

X-X 0 

0 0 

0 0 

O 


O 0* 

w « 

m CM 

no" 

or- 

O 0 

px O »P 5 0 rr~ CM 

PM 0 

S3 

O' 0 

0 N 


M 

— 9 

- o 

- 0* 

- 0» 

- 0 

O 0 

O 0 O |p o «*• 

O 0 

O 0 

O « 

0 

O 


PM cm 

PM N 

PM CM 

pm CM 

PM 0 

PM 0 

PM 0* PM 0 PM CM 

N0» 

PM 0 

0 CM 

0 0 

•o 

V 

j; 


r ss 

28 

<0 MB 
- 9 

»o m~ 

.. 0i 

xp 0 

— 0 

PM O 

— 0 

C 0 0 0 O 0 

- 0 o 0 or- 

IP 0 

o r- 

0 0 
O 1- 

PM O 

O 9 

S3 



cm n 

pm mi 

PM CM 

Pm CM 

PM 0 

PM 0 

PM MM PM 0 PM CM 

PM CM 

PM 0 

PM 0 

PM MM 

o 

ui 

V 

o | 

1 •*■ 0 

(N 0 

fM A 

n a 

82 

23 

28 

O 0 
— 0 

xr 0 p. 0 O 0 

— 9 - 0 — • 

S3 

S3 

52 

m 0 

0 



| «m n 

PM B) 

P» M 

pm CM 

PM 0 

PM 0 

PM 0 PM 0 *M 0 

PM 0 

pm CM 

PM CM 

PS CM 

Ml 


00 «M 

N ► 

»/) 0 

■*T *H 

PM 0 

" 0 

O *H t- 0 0 

0 0 

PM 90 

0 0 


Q 


PM 0 

PM 0 

PM 0 

PM «H 

rM e 

PM O 

- o X- a* - 0 

x-x 0 

— 1 0 

0 0 

0 MD 



PM 0 

CM 0 

pm CM 

PM CM 

PM CM 

P| 0 

PM 0 PI 0 pm CM 

PM 0 

PM 0 

PM CM 

PM MM 

' 


0 

PM 9 

O 0 

9 0 

~a0 O 

r* C- 

*r 0 r- 

O' CM 

0 9 

Ts 0 

“O 9 



0 0 

rn 0 

0 Cm 

PM M 

PM 0 

PM xM 

PM xH n O PM o 

— O 

-X Ok 

0 9 

0 9 


ao 

PM 0 

pm mi 

PM CM 

PM CM 

pm mi 

PM 0 

PM 0 PM 0 fM 0 

PM 0 

CM 0 

PM CM 

0 MM 



1 ** • 

~0 0 

p* 0k 


~n 0 

0 O 

PM M O «H 00 0 

0 0 

0 0 

0 O 

0 9 



9 9 

0 9 

0 CM 

PM 0 

P0 0 

»f) 0 

p« rn 0 PI Pl 

PM 0 

PM 0 

PM 0 

PM O 



pm *i 

n mi 

PM CM 

PM CM 

PM 0 

PM CM 

PM 0* PM 0 PM 0 

pm mi 

PM 0 

PM MM 

PM 0 



Ts 

9 * 

9 O 

C 0 

xp M 

38 

PM P* ' 

0 0 

~0 CM W 0 0 0 

0 0 rn 0 N» 0 

«P) 0 

rn 0 

0 9 
0 0 

PM 0 

m cm 

" 7 x « 

m mm 



pm n 

pm mi 

PM CM 

PM CM 

PM 0 

PM 0 

PM 0 PM 0 PM 0 

PM 0 

PM 0 

PM 0 

0 0 



T*' 

a 0 “ 
»o o 

pV 

0 h 

~o mT“ 

0 ► 

XP 0 

»o t- 

“pTcf 
*° h . 

— 0 O' Ml ao 0 

NO 0 0 0 0 0 


0 0 

0 0 

0 3 

"53 



PM CM 

PM 9 

PM CM 

PM 0 

PM CM 

PM CM 

pm 0 pm CM fM CM 

PM CM 

ps ci 

PS 0 

rs CM 



J2S 

9 “ 9 ~ 
P xM 

““0~0 ‘ 

H 

-56- 

”S¥" 

3S 

r- r- ion rn 9 

0 9 0 9 0 CD 

S3 

"S3 

88 

0 0 
0 9 



PM 9 

PM 9 

PM Mi 

PM 0 

PM 0 

PM 0 

PM mi PM 0 PM 0 

PM 0 

PM 0 

PM CM 

PM CM 



“Is" 

5 2 

“58" 

0 h 

O' 0 

~zf~ 

PM fH*“ 
O' 0 

S3 33 53 

00 0 

~0 0 

0 0 

"58 

OO CM 



PM « 

PM 9 

PM 0 

PM 0 

PM 0* 

PM 0 

PM 0* PM 0 PM 0 

PM 0 

PM 0 

PS 0 

PS 9 



88 

S3 

*o • 

m 0 

“s^r 

S3 

“■pT0 _ 
rn 0 

O 0 0 9 0 0 

rn 0 PM 0 pm 0 

~IP) 0~ 

0 0 

m m 

PM 0 

TS“0“ 
0 0 

nS 



0 0 

P0 0 

P0 0 

0 m 

0 0 

0 0 

0 0 mm rn 0 

0 0 

rn 0 

rn m 

0 0 


- 

4 24 
T.7T 

ntM " 

PM 9 

9 h 

"si" 

^ h 

° a 

PM 0 

MP ► 

23 

0 0 

S3 

0 h 

23 23 23 

“23" 

0 9 

0 0 
0 0 

0 9 * 

ss; 

0 9 

S3 

09 

If 

« £ 

•o 

«© 

r* 

« 

O' 

o 

PM 0 0 

0 

O 

PM 

3 

JllH 

PM 

PM 

PM 

PM 

PM 

0 

m *n n 

m 

0 

0 


Table D (continued) 


C M 
© i- 

45 

1.70 

1 44 

1.68 

— © 

© © 

0 © 
ID © 

1 ' © 
© to 

— v4 

IS Cl 

— to 

c, © 

~* © 

— v4 

© © 

ir f 

© © 

Cl PI 
© PI 

O' to 
— M 

-s 0 

— *4 

OS 

1.11 

S 8 

00 0 
© r- 

i- to 
© r- 

O *4 

© r- 

ID © 

© © 

— Cl 

© © 

o © 

p © 

1 - © 
fC © 

£ S 

o © 

s* © 

i- e 

© © 

IS t- 

-1 to 

© n 

C| 0 

© © 

— « 

p 0 

— «4 

— 0 ~* 

— *4 

— *4 

— v4 

- *4 

*-0 H 

« H 

C— ^ 

- *« 

— r4 

- *4 

— • ©4 

" *0 

*4 

— • ^ 

— v4 

3) 0 

O 00 

IS- t- 

oc © 

c c- 

O r4 
i* t* 

© © 
— © 

i © 

© © 

O © 
© © 

* t» 

-D 0 

© © 

D 0 

— © 

P* © 

-*■ to 
r-i © 

C 0 

© n 

© « 
© n 

C 0 
— © 

h- 0 

— © 

54 

1.86 

1 *- *1 

IT 00 

^ CO 

IS oo 

c to 

»s* C* 

or © 

— h 

O H 
© h 

IS) © 

— © 

s'! 0 

© © 

ss 

o © 

fP 0 

-s- 0 

© 0 

r~ © 

x © 

r* © 

© 0 
© to 

© 0 
© © 

© 

IT 0 

C 00 

i r tO 

ISl © 

IS to 

© CO 

i/j 0 

C © 

*D C- 

O' © 

© *» 

i- © 

© t» 

1 . o 

© t- 

-1 © 

-T © 

o © 

to 0 

1 - © 

D 0 

•S 0 

P 0 

© t- 
s © 

c © 

p' © 

0 *4 

© © 

© 00 
o 0 

— © 
c a* 

© © 
o © 

or O 
»o at 

O f 
id to 

© © 

.s © 

to M 
is 0 

— 0 
is t» 

X to 

© t- 

cr 0 

© © 

© © 

© © 

© © 

© © 

X h 

sD 0 

o © 

to 0 

IS © 

PI 0 

i r © 

C O 

-1 CO 

o o 

s§ 

— © 
c © 

o © 

is © 

' - © 

IS o« 

C 0 

M 0 

© © 

is, 0 

— O) 
is f 

O 0 

© r- 

1 - © 

© 

IS 0 

© © 

Cl © 

© 0 

©- H 
4t 80 

O 0 
© 0 

— n 

I'. T-* 

C TH 

c o 

© *4 

© 
© O 

is to 

■c o 

c- O 

c o 

N 0 
v © 

C © 

i- © 

•n to 

t* 0 
.S 0 

© pi 

S OO 

'1 0 

is t» 

O' © 

© »- 

l< © 

© ► 

C 0 
© 0 

— CO 

— CO 

— CO 

— eo 

— eo 

— w 

- -4 

*-o ©4 

04 

- - ©4 

©»4 

- -1 

- 04 

— *4 

— . H 

75 

1.22 

- 1 * o 

1 ' CO 

© 0 

n n 

1- *4 

c © 

1 - H 

r at 

-c o 

i- e- 

w O 

i/ cn 

o o 

rs CD 

%'• © 

r © 

o © 

^ *4 
ir <n 

US 

© © 

IS 0 

IS 0 

© 0 
•S t- 

c © 

J r n 

? 00 

1 ' CO 

oc © 

1- CO 

© ff» 

i' eo 

.s O 

1 - © 

P to 

r| 0 

f H 

S © 
w O 

is n 

o o 

3§ 

in r- 
C 0 

C M 
© 0 

X 0 

1 / 00 

|4i f© 

tr oo 

— © 

~ CO 

~ CO 

- eo 

— © 

— M 

— N 

— Cl 

— © 

— M 

— eo 

— < r4 

— ©4 

— • ©4 

— *4 

i- CO 

X O' 

o © 

oo -01 

is o» 
uC Cl 

" © 

X Cl 

— © 

is n 

c o 

x n 

w 0 

i - eo 

1 - 01 

• ' © 

IS © 

© 0 

- 

1 O. H 

o 0 

c o 

i- © 

o o 

IS -H 

o o 

© 0 

O 0 

— M 

— CO 

— CO 

— CO 

— © 

— eo 

— M 

— © 

— © 

- w 

— w 

— © 

— © 

- OT 

— »4 

- © 

~ 1 0 

o * 

0 © 

^ © 

7 © 

o to 
oo or 

s O 

/ © 

>S t- 

or ci 

— to 
at to 

i © 

X CO 

O' © 

1' CO 

l to 

f M 

o o 

1 ' w 

t r- 

r>- v 4 

© © 

~ 0 

i- © 

ss 

~ CO 

- CO 

— CO 

- CO 

— © 

- © 

— © 

— M 

— © 

- « 

- M 

- © 

— © 

- M 

— H 

1 - o 
-o © 

£ 00 

O' «o 

i © 

o © 

p* Cl 
- © 

o ' 8 

c c- 
o © 

- to 

c r © 

X »4 

is © 

is CD 

X PI 

~* © 
s © 

© o 
f to 

7 © 

x m 

i * n 

r o 

r - « 

•s 0 

1- »4 

— CO 

— CO 

— CO 

~ eo 

— « 

- © 

— © 

— © 

— © 

- e« 

— © 

— w 

- © 

- n 

— OT 

O (0 

© 0 

© 

■c 0 

J CO 

a © 

| - © 

cr © 

is CD 

c* © 

-* © 

O to 

c tn 

— © 

O' © 

<X « 

O' N 

- o 
s © 

•s r» 

X PI 

s © 

x to 

- 0 

X co 

© 0 
© 

O © 

r- © 

© CO 

- CO 

— CO 

— eo 

- © 

— M 

— CO 

— © 

— N 

— « 

— © 

— © 

— © 

- eo 

— ' OT 

© CO 

c r- 

P 

C C- 

-1 o 

: c- 

p © 

c © 

to 

^ © 

or ,h 

O' © 

l . © 
O' 0 

IS © 

O' 0 

- i t-l 

O' 0 

O e- 
o © 

© 

SJ © 

1 - *4 

0 O © 

is C- 
X n 

- © 

x n 

p © 
x to 

r 1 CO 

© CO 

© CO 

© CO 

— © 

— © 

— © 

— © 

— © 

— © 

— eo 

« « 

— « 

— « 

- CO 

? CO 

C to 

p © 

o © 

r- tO 

O t- 

■S © 

c r- 

© © 
w t- 

c O 

c t- 

— t- 

c © 

- © 

c* © 

l- © 

C* 0 

IS © 

c 0 

-«■ sn 

O 0 

© © 

O' 0 

O 0 
0 © 

O © 
00 © 

CS *4 

00 © 

© CO 

IN CO 

fN co 

© CO 

Cl CO 

ci eo 

Cl N 

— © 

— « 

— © 

© 

— © 

— © 

— CO 

— CO 

r co 
— o» 

-* o 

— © 

© tO 
— © 

— © 
— to 

o « 

— a 

/ © 
C h 

r- 
c c- 

IS © 

j f- 

PI © 

o © 

— ■ 0 

o © 

O' « 

o © 

OS) © 

C- © 

O 0 

©> 0 

•o to 
O' 0 

© »4 

0 0 

© CO 

IN CO 

in eo 

n CO 

Cl © 

in ci 

N M 

ci © 

© M 

N © 

© © 

— © 

- M 

— © 

- M 

ID 
© o 

— © 

© O 

o eo 

© © 

on to 
— © 

1 - © 

— a> 

ISl to 

— © 

© *H 
- © 


z> © 

— 0 

X* © 

c e- 

r~ © 

O f* 

is rt 

C- I* 

sc 0 

O 0 

© 0 

=> 0 

.. © 

O 0 

© CO 

•N n 

IN Cl 

IN CO 

Cl © 

IN CO 

Cl CO 

ci © 

© M 

© © 

'i «’ 

© eo 

© « 

© eo 

© eo 

Z> CO 

P> CO 

c o 

p CO 

O' ■ 

IN v4 

1 ' © 
>N *4 

> d © 

Cl w* 

^:s 

nS 

— © 

S 1 © 

O' © 

— © 

1 ' 0 

- © 

o © 

— o> 

© © 

- 0 

© 0 
- 0 

© « 
— 0 

? o 

O 0 

© CO 

IN to 

© cl 

IN CO 

Cl to 

IN W 

© to 

© n 

© « 

'i CO 

© CO 

© © 

© © 

© © 

© OT 

© © 

- 0 - <0 

— CO 

© © 

© © 

on t» 

I'D M 

i - © 

P> © 

O Tt 

rc n 

to © 
ID © 

Cl 0 
fC © 

o © 
s © 

O h 

1*1 *4 

I'. © 

Cl *4 

© r4 
© *4 

P 0 

© o 

© © 
© o 

©s 

in ci 

© to 

© Cl 

ci M 

in n 

IN tO 

IN PI 

C| to 

© to 

© © 

ci CO 

© n 

© to 

© n 

© PI 

i~ <o 

ID C- 

c © 

1 r t- 

C CO 
isi C- 

— 0 

«n © 

Cl to 

is © 

- « 

ID © 

O © 

ID © 

3* 

vC d 

© 0 

© (*• 

© © 

to © 

© © 

— *4 

© © 

O' 0 
p, © 


1 - ©~ 

tt to 

IN CO 

«n n 

IN to 

<N tO 

ci tO 

© to 

IN 10 

© m 

© 0 

© PI 

© n 

© ci 

© n 

Cl to 

n to 

- © 

00 CO 

O 00 

00 CO 

0 © 

r- CO 

ao © 
r ■» H 

O © 

(' H 

ID O 
r- co 

it 

ss 

O 0 

r- 0 

rr © 

o © 

1 . w4 

© 9 

ID 0 

C 0 

© to 

© 0 

oS 

O 0 ~ 

0 r- 

© © 

IN ©’ 

IN © 

IN © 

Cl © 

© © 

IN © 

© © 

© ci 

cl tO 

© to 

© to 

© to 

© m 

© m 

o © 

<N H 

© OO 

- o 

23~ 

— © 

ID © 

— © 

© •" 
- 01 

_ ID «' 

— 0 

C 0 
— 0 

ss 

© 

° 

’ 8 S 

© *4 
° **! 

© 0 
C 0 

88 ' 

"00 

0 0 

»d 00 

ID to 

pi o 

ID © 

© © 

tO © 

to © 

PI © 

to © 

to © 

iD © 

0 © 

to © 

tO © 

© © 

wTio 

O CO 

o * 

Cl c- 

© *H 

“"© CO 

O © 

-8T 

"**■ 

”»h" 
cr © 

T*“ 

sX 

© © 

© 0 

— “p4 

0 0 

0 C* 

«■© 
0 >1 

*ss" 

0 s 

© h 

© h 

© *- 

© h 

© ► 

to r- 

Cl h 

tO 0 

PI * 

pi © 

PI © 

ID 0 

ID 0 

PI 0 

p> 0 

© 

© 

oo 

© 

o 

ID 

ID 

© 

s 

© 

© 

O 

c» 

S 

1 

0 

© 

o 

ID 

s 

© 

§ 

© 

t 

V 


412 


Appendix 


Table E 

Transformation of r to z,* 


r 

Sr 

r 

Z r 

r 

Zr 

r 

Sr 

r 

s. 

000 

000 

200 

203 

400 

424 

600 

693 

800 

1 099 

005 

005 

205 

208 

405 

430 

60 

701 

805 

1 113 

010 

010 

210 

213 

410 

436 

610 

709 

810 

1 127 

015 

015 

215 

218 

415 

442 

615 

717 

815 

1 142 

020 

020 

220 

224 

420 

448 

620 

725 

820 

1 157 

025 

025 

225 

229 

425 

454 

625 

733 

825 

1 172 

030 

030 

23 0 

234 

430 

460 

630 

741 

830 

1 188 

035 

035 

235 

239 

435 

466 

635 

750 

835 

1 204 

040 

040 

210 

245 

440 

472 

640 

7^8 

840 

1 221 

045 

045 

245 

250 

445 

478 

645 

767 

845 

1 238 

050 

050 

2*>0 

255 

450 

435 

650 

775 

850 

1 256 

055 

055 

-55 

261 

455 

491 

65^ 

/84 

85s 

1 274 

060 

060 

260 

266 

460 

497 

660 

79 3 

860 

1 293 

065 

065 

265 

271 

465 

504 

665 

802 

865 

1 313 

070 

070 

270 

2/7 

470 

510 

670 

811 

870 

1 333 

075 

075 

275 

282 

475 

517 

675 

820 * 

875 

1 354 

080 

080 

280 

288 

480 

523 

680 

829 

880 

1 376 

085 

085 

285 

293 

48") 

5 >0 

685 

838 

885 

1 >98 

090 

090 

290 

299 

490 

5 36 

690 

848 

890 

1 422 

095 

095 

295 

304 

495 

543 

695 

8s8 

895 

1 447 

100 

100 

300 

310 

500 

549 

700 

867 

900 

1 472 

105 

105 

305 

315 

505 

556 

705 

877 

90S 

1 499 

110 

110 

310 

321 

510 

563 

710 

887 

910 

1 528 

115 

116 

315 

326 

51 5 

570 

715 

897 

915 

1 557 

120 

121 

320 

332 

520 

576 

720 

908 

920 

1 589 

125 

126 

325 

337 

525 

583 

725 

918 

925 

1 623 

130 

131 

330 

343 

530 

590 

730 

929 

930 

1 658 

135 

136 

335 

348 

535 

597 

735 

940 

935 

1 697 

140 

141 

340 

354 

540 

604 

740 

950 

940 

1 738 

145 

146 

345 

360 

545 

611 

745 

962 

945 

1 783 

150 

151 

350 

365 

550 

618 

750 

973 

950 

1 832 

155 

156 

355 

371 

555 

626 

755 

984 

955 

1 886 

160 

161 

360 

377 

560 

633 

760 

996 

960 

1 946 

165 

167 

365 

383 

565 

640 

765 

1 008 

965 

2 014 

170 

172 

370 

388 

570 

648 

770 

1 020 

970 

2 092 

175 

177 

375 

394 

575 

655 

775 

1 033 

975 

2 185 

180 

182 

380 

400 

580 

662 

780 

1 045 

980 

2 298 

185 

187 

385 

406 

585 

670 

785 

1 058 

985 

2 443 

190 

192 

390 

412 

590 

678 

790 

1 071 

990 

2 647 

195 

198 

395 

418 

595 

685 

795 

1 085 

995 

2 994 



Appendix 


4*3 


Table F 

Critical values of the correlation coefficient* 


L<.\el of siginhcince for ont tailed test 


t 

0, , 

1 02 s * 

01 

00s 


1 l 

1 c \cl of si^niru li u t 

1 _L _ 

for two t ulcd W st 


10 

05 1 

! 02 1 

01 

1 

988 

9° 7 1 

i 

090"* 

! 

90**9 

2 

900 , 

9s0 

9 SO 

‘>90 

3 

S0d 1 

87S 

OH 

9 >9 

4 , 

/ 2<> 1 

811 

SS2 

017 

5 1 

669 1 

/ s4 

SU 

874 

6 

622 

707 

789 , 

8 >4 

7 

582 

606 

750 

/OS 

8 

549 

6 >2 

710 

4 05 

9 

521 

002 , 

685 

7 >s 

10 

46>7 

•>70 

6 >8 

’’OS 

n 1 

476 , 

551 

614 

081 

12 

4sS 

5 12 

012 

061 

13 

441 

! ‘>14 

>02 

041 

1 1 

426 

49/ 

57 1 

0'> 

ts 

412 

4S2 

558 1 

! 000 

lo 

400 1 

1 408 

54' 1 

•>00 

17 

389 

| 450 

5>S 

•>75 

IS 

378 

44 4 

510 

>01 

19 

>09 

4*, 

50 3 

>4*> 

20 

>60 

42 ’ 

( 402 

1 5 37 

21 ' 

‘ 352 | 

| 41 4 

482 

520 

22 I 

! 44 4 I 

! 404 

472 

515 

2 * 1 

1 337 1 

596 

462 

505 

21 

3>0 

| 388 

45 , 

490 

2S 

42, 

j SS1 

445 

487 

26 

t * 17 

474 

437 

, 470 

27 

1 311 

467 

4 30 

1 471 

28 

>00 

161 

42 3 

I 40 > 

2° I 

>01 

*■>5 1 

l 416 

1 450 

3° | 

296 

349 

1 409 

| 449 

35 : 

27S 

325 

381 

1 418 

40 

257 

304 

358 

| 393 

4S 

24 * 

288 

338 

372 

SO 

2 41 

27 1 

322 

354 

60 j 

211 

250 

295 

325 

70 

195 

2 32 

274 

303 

80 

184 

21/ 

256 

28 3 

90 

17 4 

205 

242 

267 

100 

164 

195 

2 30 

254 



414 


Appendix 


Table G 

Critical values of p, the Spearman rank correlation coefficient* 



Significance level (one tailed test) 

N 

! ' ' i 



o 

01 

4 

1 000 


5 

900 | 

1 000 

6 

829 ; 

41 

7 

714 

893 

8 

643 

833 

9 

600 

783 

10 

564 

746 

12 

S06 

712 

14 

456 

645 

16 

425 

601 

18 

399 

564 

20 

Ml 

5 34 

22 

359 

508 

n 

343 

485 

26 

329 

465 

28 

317 

448 

30 

306 

432 



Appendix 


415 


Table H 

Probabilities associated with values as large as observed values 
of & in the Kendall rank correlation coefficient * 


1 


\ ihu s of \ 

1 


\ ll\H S Of 

V 

s 1 









4 

L : j 

S 

9 

6 

1 

7 1 

10 

0 

62 

V>2 1 

S48 

S40 

1 1(10 I 

■*00 l 

soo 

2 

j7 ■> 

408 

4S2 

460 

1 >60 

586 1 

411 

4 

167 

2 42 

160 

*81 1 

-> 2 *> 

281 

104 

6 

042 

117 

271 

506 

7 Ho 

161 1 

100 

S 


012 

199 

218 

9 068 

119 | 

242 

10 


0081 

158 

1 /9 

11 02S 

068 

190 

12 1 



1 089 

1 >0 

11 0081 

0>S 

146 

14 


| 

0S4 

090 

IS 0014 

01S 

108 

10 



Oil 

060 

17 

001 1 

078 

IS 



016 

1 018 

| 

19 

0014 

1 

0S4 

20 



| 0071 

1 022 

21 

j (K)020 

016 




002S 

1 012 

25 


021 

24 1 

1 


1 OOOS7 

0061 

2S 


014 

2o 1 

1 


1 00019 

00 >9 

27 1 

1 

0081 

28 

1 


00002S 

0012 

1 29 


0016 

10 



1 

1 

00041 

! n 1 


0021 

>2 



I 

00012 



0011 

14 



1 

00002S 

1 


00047 

16 

1 


1 

0000028 

57 1 


00018 




1 


1 59 1 


OOOOSS 




j 


41 | 


(K)001 S 






» | 


0000028 





1 

1 45 


00000028 




4x6 


Appendix 


Table I 

Critical values of 7 m the Wilcoxon matched-pairs 
signed -ranks test 41 



I evcl of signifii inti for otu tailtd test 


02 S 

01 

005 

N 

— 




Level of significance for tw 

o lailtd t< st 


OS 

02 

01 

6 

0 



7 

2 

0 

- 

8 

4 

2 

0 

9 

6 

3 

2 

10 

8 

5 

3 

11 

11 

7 

S 

12 

14 

10 

7 

n 

17 

13 

10 

14 

21 

16 

n 

IS 

25 

20 

16 

16 

30 

24 

20 

17 

35 

28 

23 

18 

40 

33 

28 

19 

46 

38 

32 

20 

52 

43 

38 

. 

21 

59 

49 

43 

22 

06 

56 

49 

23 

73 

62 

55 

24 

81 

69 

61 

25 | 

89 

77 

68 



Appendix 


417 


Table J 

Coefficients of orthogonal polynomials 


k 

Polynomial 

Coefficients i 

1 

Sr., 1 

3 

Linear 

-1 

0 

l 








2 


Quadratic 

1 

— 2 

1 








6 


Linear 

-3 

-1 

1 

3 







20 

4 

Quadratic 

1 

-1 

-1 

1 







4 


Cubic 

-1 

3 

-3 

1 







20 


Linear 

- 2 

-1 

0 

1 

2 






10 

5 

Quadratic 

2 

-1 

_2 

-1 

2 






14 


Cubic 

-1 

2 

0 

_2 

1 






10 


Quart ic 

1 

-4 

6 

-4 

1 






70 


Linear 

-5 

-3 

-1 

1 

3 

5 





70 

6 

Quadratic 

5 

-1 

-4 

-4 

-1 

5 





S4 


Cubic 

-5 

7 

4 

-4 

-7 

5 





180 


Quart ic 

1 

- 3 

2 

2 

-3 

1 





28 


Linear 

-3 

-2 

-1 

0 

1 

2 

3 




28 

7 

Quadratic 

;> 

0 

-3 

4 

— 3 

0 

5 




84 


Cubic 

1 

1 

1 

0 

- 1 

-1 

1 




I 6 


Quartic 

3 

-7 

1 

6 

1 

-7 

3 




! 154 

1 


Linear 

— 7 

-5 

— 3 

-J 

1 

3 

5 

7 



168 


Quadratic 

7 

1 

-3 

-5 

-5 

-3 

1 

i 



10S 

8 

Cubic 

-7 

5 

7 

3 

- 3 

-7 

-5 

7 



264 


Quartic 

7 

- 13 

-3 

9 

9 

-3 

-13 

7 



616 


Quintic 

-7 

23 

-17 

-15 

15 

17 

-23 

7 



2184 


Linear 

-4 

-3 

-2 

-1 

0 

1 

2 

3 

4 


60 


Quadratic 

28 

7 

-8 

-17 

-20 

-17 

-8 

7 

28 


2772 

9 

Cubic 

-14 

7 

13 

9 

0 

-9 

-13 

-7 

14 


990 


Quartic 

14 

-21 

-11 

9 

18 

9 

-11 

-21 

14 


2002 


Quintic 

- 4 

11 

-4 

-9 

0 

9 

4 

-11 

4 


468 


Linear 

-9 

-7 

-5 

-3 

-1 

1 

3 

5 

7 

9 

330 


Quadratic 

6 

2 

- 1 

-3 

-4 

-4 

-3 

-1 

■» 

a 

132 

10 

Cubic 

-42 

14 

35 

31 

12 

-12 

-31 

-35 

-14 

42 

8580 


Quartic 

18 

-22 

-17 

3 

18 

18 

3 

-17 

-22 

18 

2860 

1 

Quintic 

-6 

14 

- 1 

-11 

-6 

6 

11 

1 

-14 

6 

780 








4x8 


Appendix 


Table K 

Squares and square roots of numbers from z to 1,000* 


Number 

Square 

Square root 

Number 

Square 

Square root 

1 

1 

1 0000 

' 41 

16 81 

6 4031 

2 

4 

1 4142 

42 

1764 

6 4807 

3 

9 

1 7321 

43 

18 49 

6 5574 

4 

16 

2 0000 

44 

19 36 

6 6332 

5 

25 

2 2361 

45 

20 25 

6 7082 

6 

3' 

2 4495 

46 

21 16 

6 7823 

7 

19 

2 6458 

47 

22 09 

6 8557 

8 

f 4 

2 8284 

4S 

23 04 

6 9282 

9 

81 

3 0000 

49 

24 01 

7 0000 

10 

l 00 

3 U>23 

50 

25 00 

7 C711 

11 

1 21 

3 3166 

51 

26 01 

7 1414 

12 

1 44 

3 464 1 

52 

27 04 

7 2111 

n 

1 69 

3 60v, 

5 

28 09 

7 2801 

14 

1 96 

J 7417 

<4 

29 16 

7 3485 

IS 

2 25 

3 87 }0 

55 

30 25 

7 4162 

U 

2 56 

4 0000 

56 

’ 1 36 

7 4833 

17 

2 89 

4 1231 

57 

32 49 

*7 3498 

18 

3 24 

4 2426 

5S 

33 64 

7 6158 

19 

1 61 

4 3589 

59 

U 81 

7 6811 

20 

400 

4 4721 

60 

i 


7 7460 

21 

4 41 

4 5826 

6l ! 

37 21 

7 8102 

22 

481 

4 6904 1 


ss44 

7 8740 

21 

5 29 

1 7958 

f « 

39 69 

7 9373 

24 

5 76 

4 899() 

1 ( 4 

40 96 

8 0000 

2S 

6 25 

s 0000 

' Cl 

12 25 

8 0623 

26 

f 76 

5 09 tf) || 66 

4 3 56 

8 1240 

27 

7 29 j 

5 1962 

67 

14 S9 

h 1854 

28 

7 84 

5 2915 

68 

46 24 

8 2462 

29 

8 41 

5 3852 

69 

47 61 

8 3066 

30 

900 

5 4772 

70 


8 3666 

31 

9 61 

5.5678 

71 

50 41 

8 4261 

32 

10 24 

5 6569 

72 

51 84 

8 4853 

33 

10 89 

5 7446 

71 

53 29 

8 5440 

34 

11 56 

5 8110 , 

74 

54 76 

8 6023 

35 

12 25 

5 9161 1 

75 

56 25 

8 6603 

36 

12 96 

6 0000 

76 

57 76 

8 7178 

37 

13 69 

6 0828 

77 

59 29 

8 7750 

38 

14 44 

6 1644 

78 

60 84 

8 8318 

39 

15 21 

6.2450 

79 

62 41 

8 8882 

40 

1600 

6 3246 

80 

64 00 

8.9443 








Appendix 


419 


Table K (continued) 


Number 

Square 

Square root 

81 

65 61 

9 0000 

82 

67 24 

9 0554 

83 

68 89 

9 1104 

84 

70 56 

9 1652 

85 

72 25 

9 2195 

86 

| 73 96 

9 2736 

87 

75 69 

9 3274 

88 

77 44 

9 3808 

89 

79 21 

9 4340 

90 

81 00 

9 48f8 

91 

82 81 

9 5394 

92 

84 64 

9 5;17 

93 

86 49 

9 6137 

94 

88 36 

9 6951 

95 

90 25 

9 7468 

96 

92 16 

9 79S0 

97 

94 09 

9 8189 

98 

96 04 

9 8995 

99 

98 0i 

9 9499 

100 

1 00 00 

10 0000 

101 

1 02 01 

10 0199 

107 

1 04 04 

10 099 s 

103 

1 06 09 

10 148 ) 

104 

1 08 16 

10 1980 

105 

1 10 25 

10 24/0 

106 

J 12 36 

10 2956 

107 

1 14 49 

10 3441 

108 

1 16 64 

10 3973 

109 

1 18 81 

M 4403 

110 

1 21 00 

10 4881 

111 

1 21 21 

10 5357 

112 

1 25 44 

10 58 >0 

113 

1 27 69 

10 6 >01 

114 

1 79 96 

10 6771 

115 

1 32 25 

10 7238 

116 

1 34 56 

10 7703 

117 

1 36 89 

10 8167 

118 

1 39 24 

10 8628 

119 

1 41 61 

10 9087 

120 

144 00 

10 9545 


Number 

Square 

Square rool 

121 

146 41 

11 0000 

122 

1 48 84 

11 0454 

123 

1 51 29 

11 0905 

124 

1 53 76 

11 1355 

125 

1 56 25 

11 1803 

126 

1 58 76 

11 2250 

127 

1 61 29 

11 2694 

128 

1 63 84 

11 3137 

129 

1 60 H 

11 3578 

130 

1 69 00 

11 *018 

131 

1 71 61 

11 4455 

132 

1 74 24 

11 4891 

133 

1 76 

11 5326 

134 

1 79 56 

11 5758 

135 

1 82 25 

11 6190 

136 

1 8196 

11 6619 

]37 

1 87 o9 

11 7047 

138 

1 90 44 

11 7473 

139 

1 93 21 

11 7898 

140 

1 96 00 

11 8322 

141 

1 98 81 

11 8743 

142 

2 01 64 

11 9164 

143 

2 04 49 

11 9583 

114 

2 07 36 

12 0000 

115 

2 10 25 

12 0416 

146 

2 13 16 

12 0830 

147 

2 16 09 

12 1244 

148 

2 19 04 

12 1655 

149 

1 21 01 

12 2066 

150 

2 25 00 

12 2474 

151 

2 28 01 

12 2882 

152 

2 M 04 

12 3288 

153 

2 34 09 

12 3693 

154 

2 37 16 

12 4097 

155 

2 40 25 

12 4499 

156 

2 43 36 

12 4900 

157 

2 46 49 

12 5300 

158 

2 49 64 

12 5698 

159 

2 52 81 

12.6095 

160 

2 56 00 

12.6491 







420 


Appendix 


Table K (continued) 


Number 

Square 

Square root 

Number 

Square 

Square root 

161 

2 59 21 

12 6886 

201 

4 04 01 

14 1774 

162 

2 62 44 

12 7279 

202 

408 04 

14 2127 

163 

2 65 69 

12 7671 

203 

4 12 09 

14 2478 

164 

2 68 96 

12 8062 

204 

4 16 16 

14 2829 

165 

2 72 25 

12 8452 

205 

4 20 25 

14 3178 

166 

2 75 56 

12 8841 

206 

4 24 36 

14 3527 

167 

2 78 89 

12 9228 

207 

4 28 49 

14 1875 

168 

2 82 24 

12 9615 

208 

4 32 64 

14 4222 

169 

2 85 61 

13 0000 

209 

4 36 81 

14 4568 

170 

2 89 00 

13 0384 

210 

4 4100 

14 4914 

171 

2 92 41 

13 0767 

211 

4 45 21 

14 5258 

172 

2 95 84 

13 1149 

212 

4 49 44 

14 5602 

173 

2 99 29 

13 1529 

213 

4 53 69 

14 5945 

174 

3 0? 76 

13 1W 

214 

4 57 96 

14 6287 

175 

3 06 25 

13 22S8 

215 

4 62 25 

14 6629 

176 

3 09 76 

13 2665 

216 

4 66 56 

14 6969 

177 

3 13 29 

13 3041 

217 

4 70 89 

14 7309 

178 

1 16 84 

13 3417 

218 

4 75 24 

•14 7648 

179 

3 20 41 

13 3791 

219 

4 79tl 

14 7986 

180 

3 24 00 

13 4164 

220 

48100 

14 8324 

181 

3 27 61 

13 4<36 

221 

4 88 41 

14 8661 

182 

3 31 24 

13 4907 

222 

4 92 84 

14 8997 

183 

3 34 89 

J3 5277 

223 

4 9/ 29 

14 9332 

184 

3 38 56 

1< 5617 

224 

5 01 76 

14 9666 

185 

3 42 25 

13 6015 

225 

5 06 25 

15 0000 

186 

3 45 96 

13 6?82 

226 

5 10 76 

15 0333 

187 

3 49 69 

13 6/48 

227 

5 ]■> 29 

15 0665 

188 

3 *»3 44 

13 7113 

228 

5 19 84 

15 0997 

189 

3 57 21 

13 7477 

229 

5 24 41 

15 1 127 

190 

3 6100 

13 7840 

230 

1 5 29 00 

15 1658 

191 

3 64 81 

13 8203 

231 

5 33 61 

15 1987 

192 

3 68 64 

13 85o4 

232 

5 38 24 

15 2315 

193 

3 72 49 

13 8924 

233 

5 42 89 

15 2643 

194 

1 3 76 36 

13 9284 

234 

5 47 56 

! 15 2971 

195 

3 80 25 

13 9642 

235 

5 52 25 

15 3297 

196 

3 84 16 

14 0000 

236 

5 56 96 

15 3623 

197 

3 88 09 

14 03^7 

237 

5 61 69 

15 3948 

198 

3 92 04 

14 0712 

238 

5 66 44 

15 4272 

199 

3 % 01 

14 1067 

239 

5 7121 

! 15 4596 

200 

40000 

14 1421 

240 

5 76 00 

15 4919 








Appendix 


421 


Table K (continued) 


Number 

Square 

| 

Square root 

Number 

Square 

Square root 

241 

5 80 81 

15.5242 

281 

7 89 61 

16.7631 

242 

5 85 64 

15.5563 

282 

7 95 24 

16.7929 

243 

5 90 49 

15.5885 

283 

8 00 89 

16 8226 

244 

5 95 36 

15 6205 

284 

8 06 S6 

16.8523 

245 

6 00 25 

15.6525 

285 

8 12 25 

16.8819 

246 

60S 16 

15 6844 

286 

8 17 96 

16.9115 

247 

6 10 09 

15.7162 

287 

8 23 69 

16.9411 

248 

6 15 04 

15.7480 

2K8 

8 29 44 

16 9706 

249 

6 20 01 

15.7797 

289 

8 35 21 

i 17 0000 

250 

6 25 00 

15.8114 

290 

8 4100 

17.0294 

251 

6 30 01 

15 8430 

291 

8 46 81 

17.0587 

252 

6 35 04 

IS 874S 

292 

8 52 64 

17.0880 

253 

6 40 09 

15.9060 

293 

8 58 49 

17 1172 

254 

6 45 16 

15.9374 

294 

8 64 36 

17.1464 

255 

6 50 25 

15.9687 

295 

8 70 25 

17.1756 

256 

6 55 36 

16 0000 

296 

8 76 16 

17.2047 

257 

6 60 49 

16.0312 

297 

8 82 09 

17 2337 

258 

6 65 64 

16.0624 

298 

8 88 04 

17 262 7 

259 

6 70 81 

16.0935 

299 

8 94 01 

17 2916 

260 

6 76 00 

16.1245 

300 

900 00 

17.3205 

261 

6 81 21 

16.1555 

301 

9 06 01 

17 3494 

262 

6 86 44 

16.1864 

302 

9 12 04 

17 3781 

263 

6 91 69 

16 2173 

303 

9 18 09 

17 4069 

264 

6 96 96 

16 2481 

304 

9 24 16 

17 4356 

265 

7 02 25 

16.2788 

305 

9 30 25 

17 4642 

266 

7 07 56 

16.3095 

306 

9 36 36 

17 4929 

267 

7 12 89 

16.3401 

307 

9 42 49 

17.5214 

268 

7 18 24 

16 3707 

308 

9 48 64 

17.5499 

269 

7 23 61 

16.4012 

309 

9 54 81 

17.5784 

270 

7 29 00 

16.4317 

310 

9ol 00 

17.6068 

271 

7 34 41 

16.4621 

311 

9 67 21 

17.6352 

272 

7 39 84 

16 4974 

312 

9 73 44 

17.6635 

273 

7 45 29 

16.5227 

313 

9 79 69 

17.6918 

274 

7 50 76 

16.5529 

314 

9 85 96 

17 7200 

275 

7 56 25 

16.5831 

315 

9 92 25 

17.7482 

276 

7 61 76 

16.6132 

316 

9 98 56 

17.7764 

277 

7 67 29 

16.6433 

317 

10 04 89 

17 8045 

278 

7 72 84 

16.6733 

318 

10 11 24 

17.8326 

279 

7 78 41 

16.7033 

319 

10 17 61 

17.8606 

280 

784 00 

16.7332 

320 

10 24 00 

17.8885 








422 


Appendix 


Table K (continued) 


Number 

Square 

Square root 



Square root 

321 

10 30 41 

17 9165 

361 

13 03 21 

19 0000 

322 

10 36 84 

17 9444 

362 

13 10 44 

19 0263 

323 

10 43 29 

17 9722 

363 

13 17 69 

19 0526 

324 

10 49 76 

18 0000 

364 

13 24 96 

19 0 7 88 

325 

10 56 25 

18 0278 

365 

13 32 25 

19 1050 

326 

10 62 76 

18 0555 

366 

13 39 56 

19 1311 

327 

10 69 29 

18 0831 

367 

13 46 89 

19 1572 

328 

10 75 84 

18 1108 

j68 

13 54 24 

19 1833 

329 

10 82 41 

18 1384 

369 

13 61 61 

19 2094 

330 

10 89 00 

18 16S9 

370 

13 69 00 

19 2354 

331 

10 95 61 

18 1934 

371 

13 76 41 

19 2614 

332 

11 02 24 

18 2209 

372 

13 83 84 

19 2873 

333 

1 1 08 89 

18 2483 

373 

13 91 29 

19 3132 

334 

11 IS 56 

18 2757 

374 

13 98 76 

19 3391 

335 

11 22 25 

18 3030 

375 

14 06 25 

19 3649 

336 

11 28 96 

18 3303 

376 

14 13 76 

19 3907 

337 

11 35 69 

18 3576 

377 

14 21 29 

19 4165 

338 

11 42 44 

18 3848 

378 

14 28 84 

• 19 4422 

339 

11 49 21 

18 4120 

379 

14 36 41 

19 4679 

340 

11 S6 00 

18 4391 

380 

14 44 00 

19 4936 

341 

1162 81 

18 4662 

381 

14 51 61 

19 5192 

342 

11 <9 64 

18 4932 

382 

14 59 24 

19 5448 

343 

11 76 49 

18 5203 

383 

14 66 89 

19 5704 

344 

11 83 36 

18 ->472 

384 

14 74 56 

19 5959 

345 

11 90 25 

18 5742 

385 

14 82 25 

19 6214 

346 

11 97 16 

18 6011 

386 

14 89 96 

19 6469 

347 

12 04 09 

18 6279 

387 

14 9 7 69 

19 6723 

348 

12 11 04 

18 6548 

388 

15 05 44 

19 6977 

349 

12 18 01 

18 6815 

389 

15 13 21 

19 7231 

350 

12 2SOO 

18 7083 

390 

15 2100 

19 7484 

351 

12 32 01 

IS 7350 

391 

15 28 81 

19 7737 

352 

12 39 04 

18 7617 

392 

15 36 64 

19 7990 

353 

12 46 09 

18 7883 

393 

15 44 49 

19 8242 

354 

12 S3 16 

18 8149 

394 

15 52 36 

19 8494 

355 

12 60 25 

18 8414 

*95 

15 60 25 

19 8746 

356 

12 67 36 

18 8680 

396 

15 68 16 

19 8997 

357 

12 74 49 

18 8944 

397 

IS 76 09 

19 9249 

358 

12 81 64 

18 9209 

398 

15 84 04 

19 9499 

359 

12 88 81 

18 9473 

399 

15 92 01 

19 9750 

360 

12 96 00 

18 9737 

400 

1600 00 

20 0000 






Appendix 


423 


Table K (continued) 


Number 

Square 

Square root 

Number 

Square 

Square root 

401 

16 08 01 

20 0250 

441 

19 44 81 

21 0000 

402 

16 16 04 

20 0499 

442 

19 53 64 

21 0238 

403 

16 24 09 

20 0749 

443 

19 62 49 

21 0476 

404 

16 32 16 

20 0998 

444 

19/1 36 

1 21 0713 

405 

16 40 25 

20 1246 

445 

19 80 25 

21 0950 

406 

16 48 36 

20 1494 

446 

19 89 16 

21 1187 

407 

16 56 49 

20 1742 

447 

19 98 09 

21 1424 

408 

16 64 64 

20 1990 

448 

20 07 04 

21 1660 

409 

16 72 81 

20 2237 

449 

20 16 01 

21 1896 

410 

16 81 00 

20 2485 

450 

20 25 00 

21 2i32 

411 

16 89 21 

20 2731 

451 

20 14 01 

21 2368 

412 

16 97 44 

20 2978 

452 

20 43 04 

21 2603 

413 

17 05 69 

20 3224 

453 

20 52 09 

21 2818 

414 

17 11 96 

20 3470 

454 

20 (i 16 

21 3073 

415 

17 22 25 

20 3715 

455 

20 70 25 

21 3307 

416 

17 10 56 

20 3961 

456 

20 79 36 

21 3542 

417 

17 38 89 

20 4206 

457 

20 88 49 

21 1776 

418 

17 47 24 

20 4450 

458 

20 97 64 

21 4009 

419 

17 55 61 

20 4695 

459 

21 06 81 

21 4243 

420 

1764 00 

20 4939 

460 

21 16 00 

21 4476 

1 

421 

17 72 41 

20 5181 

461 

21 25 21 

21 4709 

422 

17 80 84 

20 5426 

462 

21 34 44 

21 4942 

421 

17 89 29 

20 5670 

463 

21 41 69 

21 5174 

424 

17 97 76 

20 5913 

464 

21 52 96 

21 5407 

425 

18 06 25 

20 6155 

465 

21 62 25 

21 5639 

426 

18 14 76 

20 6398 

466 

21 71 56 

21 5870 

427 

18 23 29 

20 6640 

4o7 

21 80 89 

21 6102 

428 

18 31 84 

20 6882 

468 

21 90 24 

21 6333 

429 

18 40 41 

20 7121 

469 

21 9961 

21 6564 

430 

18 49 00 

20 7364 

470 

22 09 00 

21 6795 

431 

18 57 61 

20 7605 

471 

22 18 41 

21 7025 

432 

18 66 24 

20 7846 

472 

22 27 84 

21 7256 

433 

18 74 89 

20 8087 

4' T 3 

22 37 29 

21 7486 

434 

18 83 56 

20 8327 

471 

22 46 76 

21 7715 

435 

18 92 25 

20 8567 

475 

22 56 25 

21 7945 

436 

19 00 96 

20 8806 

476 

22 65 76 

21 8174 

437 

19 09 69 

20 9045 

477 

22 75 29 

21 8403 

438 

1918 44 

20 9284 

478 

22 84 84 

21 8632 

439 

19 27 21 

20 9523 

479 

22 94 41 

21 8861 

440 

19 3600 

20 9762 

480 

23 04 00 

21 9089 








424 


Appendix 


Table K (continued) 


Number 

Square 

Square root 

481 

23 13 61 

21 9317 

482 

23 23 24 

21 9545 

483 

23 32 89 

21 9773 

484 

23 42 ^6 

22 0000 

485 

23 52 25 

22 0227 

486 

23 61 96 

22 0454 

487 

23 71 (9 

22 0681 

488 

2 ~ 81 44 

22 0907 


23 91 21 

22 1133 

490 

24 01 00 

22 1359 

491 

24 10 81 

22 1585 

492 

24 20 04 

22 1811 

493 

24 30 49 

22 2036 

494 

24 40 36 

22 2261 

49 5 

24 50 25 

22 2486 

496 

21 60 16 

22 2711 

497 

24 70 09 

22 2935 

498 

24 80 04 

22 3159 

499 

24 90 01 

22 3383 

500 

25 00 uU 

22 3607 

501 

25 10 01 

22 3830 

502 

25 20 04 

22 4054 

503 

25 30 09 

22 4277 

504 

25 40 16 

22 4499 

505 

25 50 25 

22 4722 

506 

25 60 36 

22 4944 

507 

25 70 49 

22 5167 

508 

25 80 64 

22 5389 

509 

25 90 81 

22 5610 

510 

26 01 00 

22 5832 

511 

26 1121 

22 6053 

512 

26 21 44 

22 6274 

513 

26 31 69 

22 o495 

514 

26 41 96 

22 6716 

515 

26 52 25 

22 6936 

516 

26 62 56 

22 7156 

517 

26 72 89 

22 7376 

518 

26 83 24 

22 7596 

519 

26 93 61 

22 7816 

520 

27 04 00 

1 22 8035 



Square 

Square root 

521 

27 14 41 

22 8254 

522 

27 24 84 

22 8473 

523 

27 35 29 

22 8692 

574 ; 

27 45 76 

22 8910 

525 

27 56 23 

22 9129 

526 

27 66 76 

22 9347 

527 

27 77 29 

22 9565 

528 

27 87 84 

22 9783 

529 

27 98 41 

23 0000 

530 

28 09 00 

23 0217 

531 

28 19 61 

23 0434 

532 

28 30 24 

23 0651 

533 

28 40 89 

23 0868 

534 

28 51 56 

23 1084 

535 

28 62 23 

23 1301 

536 

28 72 96 

23 1517 

537 

28 83 69 

23 1731 

538 

28 94 44 

23 1948 

539 

29 05 21 

23 2164 

540 

29 16 00 

23 2379 

541 

29 26 81 

23 2594 

542 

29 37 64 

23 2809 

543 

29 48 49 

23 3024 

544 

29 59 36 

23 3238 

545 

29 70 25 

23 3452 

546 

29 81 16 

23 3666 

547 

29 92 09 

23 3880 

548 

30 03 04 

23 4094 

549 

30 14 01 

21 4307 

550 

30 25 00 

23 4521 

551 

30 36 01 

23 4734 

552 

30 47 04 

23 4947 

553 

30 58 09 

23 5160 

554 

30 69 16 

23 5372 

555 

30 80 25 

23 5584 

556 

30 91 36 

23 5797 

557 

3102 49 

23 6008 

558 

31 13 64 

23 6220 

559 

31 24 81 

23 6432 

560 

31 36 00 

23 6643 










Appendix 


425 


Table K (continued) 


Number 

Square 

Square root 

Number 

Square 

Square root 

561 

31 47 21 

23 6854 

601 

36 12 01 

24 5153 

562 

31 58 44 

23 7063 

602 

36 24 04 

24 5357 

563 

31 69 69 

23 7276 

603 

36 36 09 

24 5561 

564 

31 80 96 

23 7487 

604 

36 48 16 

24 5764 

565 

31 92 25 

23 7697 

605 

36 60 25 

24 5967 

566 

32 03 56 

23 7908 

606 

36 72 36 

24 6171 

567 

32 14 89 

23 8118 

607 

36 84 49 

24 6374 

568 

32 26 24 

23 8328 

( 08 

36 9(_> 64 

24 6577 

569 

32 37 61 

23 8537 

609 

37 08 81 

24 6779 

570 

32 49 00 

23 8747 

610 

37 21 00 

24 6982 

571 

32 60 41 

23 8956 

611 

37 33 21 

24 7184 

572 

32 71 84 

23 9165 

612 

3" 45 44 

24 7385 

573 

32 83 29 

23 9374 

613 

37 57 69 

24 7 588 

574 

32 94 76 

23 9583 

04 

37 69 96 

24 7790 

575 

33 06 25 

23 9792 

615 

37 82 25 

24 7992 

576 

33 17 76 

24 0000 

616 

37 91 56 

24 8193 

577 

33 29 29 

24 0208 

617 

38 06 89 

24 8395 

578 

33 40 84 

24 0416 


38 19 24 

?4 8596 

579 

33 52 41 

24 0624 

619 

38 31 a 

24 ,<’97 

580 

33 64 00 

24 0832 

620 

38 44 00 

24 8998 

581 

33 75 61 

24 10>9 

! 621 

38 So 41 

24 9199 

582 

33 87 24 

24 1247 

622 

38 6 S 84 

24 9399 

583 

33 98 89 

24 1454 

623 

38 SI 29 

„4 9600 

584 

34 10 56 

24 1661 

1 624 

<> 

00 

24 9800 

585 

34 22 25 

24 1868 

| 625 

39 06 25 

25 0000 

586 

34 33 96 

24 2074 

626 

39 1 8 76 

25 0200 

587 

34 45 69 

24 2281 

627 

»9 11 29 

25 0400 

588 

34 57 44 

24 2487 

628 

39 41 84 

25 0599 

589 

34 69 21 

24 2693 

629 

39 <6 41 

25 0799 

590 

34 8100 

24 2899 | 

OO 

39 69 00 

25 0998 

591 

34 92 81 

24 3105 

631 

39 81 O 

25 1197 

592 

35 04 64 

24 33H 

632 

39 94 24 

25 1396 

593 

35 16 49 

24 3516 

633 

40 06 89 

25 1595 

594 

35 28 36 

24 3721 

634 

40 19 56 

25 1794 

595 

35 40 25 

24 3926 

635 

40 32 25 

25 1992 

596 

35 52 16 

24 4131 

636 

40 44 96 

25 2190 

597 

35 64 09 

24 4336 

637 

40 57 69 

25 2389 

598 

35 76 04 

24 4540 

638 

40 70 44 

25 2587 

599 

35 88 01 

24 4745 

639 

40 83 21 

25 2784 

600 

36 00 00 

24 4949 

640 

40 96 00 

25 2982 






426 


Appendix 


Table K (continued) 


Number 

Square 

Square root 

Number 

Square 

Square root 

641 

41 08 81 

25 3180 

681 

46 37 61 

26 0960 

642 

41 21 64 

2? 3377 

682 

46 51 24 

26 1151 

643 

41 34 49 

25 3574 

683 

46 64 89 

26 1343 

644 

41 47 36 

25 3772 

684 

46 78 56 

26 1534 

64? 

41 60 25 

25 30(9 

685 

46 92 25 

26 1725 

646 

41 73 16 

25 4165 

686 

47 05 96 

2 § 1916 

647 

41 86 09 

25 4362 

687 

47 19 69 

26 2107 

648 

41 99 04 

25 4?5S 

688 

47 13 44 

26 2298 

649 

42 12 01 

2? 4755 

689 

47 47 21 

26 2488 

650 

42 2? 00 

25 4951 

690 

47 6100 

26 2679 

651 

42 38 01 

25 5147 

691 

47 74 81 

26 2869 

652 

42 51 04 

25 5143 

692 

47 88 64 

26 30^9 

653 

42 64 09 

25 5539 

693 

48 02 49 

26 7 249 

654 

42 77 16 

25 5734 

694 

48 16 36 

26 3439 

65? 

42 90 25 

25 5930 

695 

48 10 25 

2 > 3629 

656 

43 03 36 

2? 6123 

696 

48 44 16 

26 3818 

657 

43 16 49 

25 6320 

697 

48 58 09 

26 4008 

6?8 

43 29 64 

25 6515 

698 

48 72 04 

*26 4197 

659 

43 42 81 

25 6710 

699 

48 86 01 

76 4386 

660 

43 5o 00 

25 6905 

700 

49 00 00 

26 4575 

661 

43 69 21 

25 7099 

701 

1401 

26 4764 

662 

43 82 44 

25 7294 

702 

49 28 04 

26 4953 

663 

43 9? 69 

25 7488 

703 

49 42 09 

26 5141 

664 

44 08 96 

2? 7682 

704 

49 56 16 

26 ?330 

665 

44 22 25 

25 7876 

705 

49 70 25 

26 5518 

666 

44 35 ?6 

25 8070 

706 

49 84 36 

26 5707 

667 

44 48 89 

25 8263 

707 

49 98 49 

26 5895 

668 

44 62 24 

25 8457 

708 

50 12 64 

26 6083 

669 

44 75 61 

25 8650 

709 

50 26 81 

26 6271 

670 

44 89 00 

25 8844 

710 

50 41 00 

26 6458 

671 

45 02 41 

25 9037 

711 

50 55 21 

26 6646 

672 

45 15 84 

25 9230 

712 

50 69 44 

26 6833 

673 

45 29 29 

25 9422 

713 

50 83 69 

26 7021 

674 

4? 42 76 

25 9615 

714 

50 97 96 

26 7208 

675 

45 56 25 

25 9808 

715 

51 12 25 

26 7395 

676 

45 69 76 

26 0000 

716 

51 26 56 

26 7582 

677 

45 83 29 

26 0192 

717 

5140 89 

26 7769 

678 

45 96 84 

26 0384 

718 

51 55 24 

26 7955 

679 

46 10 41 

26 0576 

719 

51 69 61 

26 8142 

680 

46 24 00 

26 0768 

720 

5184 00 

26 8328 










Appendix 


427 


Table K (continued) 


Number 

Square 

Square root J 

Number 

Square 

Square root 

721 

51 98 41 

26 8S14 ! 

761 

5 7 91 21 

27 5862 

722 

52 12 84 

26 8701 j 

762 

58 06 44 

27 604 3 

723 

52 27 29 

26 8sS7 

763 

C S 21 (9 

27 6225 

771 

52 41 76 

26 9072 

764 

S« 1 96 

77 6405 

72 •> 

52 56 25 

26 9258 1 

7* 5 

58 : 2 25 

27 6586 

72(> 

52 70 76 

26 9444 1 

7(6 

58 6 7 

27 6767 

727 

52 85 29 

26 9t29 

7(7 

r S s: 89 

27 6948 

728 

52 99 84 

26 981 

768 

58 98 24 

27 7128 

729 

53 14 41 

27 0000 

769 

59 1) 61 

27 *08 

730 

53 29 00 

27 0185 

770 

59 29 00 

27 7 189 

731 

53 43 61 

27 0?7U 

771 

59 44 41 

27 7669 

732 

53 58 24 

27 0555 

772 

59 59 84 

27 7849 

733 

53 72 89 

27 0 7 40 

771 

59 75 29 

27 8029 

734 

53 fi** 5<> 

27 0924 

7” 4 

59 90 7o 

27 8209 

735 

54 02 25 

27 1109 

775 

60 06 25 

27 8388 

736 

54 1<> % 

27 1293 

77 6 

60 21 76 

27 8568 

737 | 

54 31 69 

2/ 1477 

777 

60 17 29 

27 8747 

738 

<4 46 44 

27 It t 2 

778 

60 52 84 

27 8927 

739 1 

r i61 27 

27 1846 1 

779 

t>0 68 41 

2” 9106 

740 ; 

, 

54 76 00 

27 2029 

780 

60 84 00 

27 9285 

i 

741 

54 90 81 

27 2213 1 

781 

60 99 61 

27 9464 

742 

55 05 64 

27 2397 l 

782 

61 15 24 

77 9643 

743 

55 20 49 

27 2<R0 1 

783 

61 30 89 

27 9821 

744 

55 35 36 

27 2764 

7 S4 

61 46 56 

28 0000 

745 

55 50 25 

27 2947 

785 

61 62 25 

28 0179 

740 

55 65 16 

27 3130 

786 

61 77 96 

28 0357 

747 

55 80 09 

27 3313 

787 

61 93 69 

28 0535 

748 

55 95 04 

27 3496 

788 

62 09 44 

28 0713 

749 

56 10 01 

27 3679 

789 

62 25 21 

28 0891 

750 

56 25 00 

27 3861 

790 

62 41 00 

28 1069 

751 

56 40 01 

27 4044 

791 

t>2 56 81 

28 1247 

752 

56 55 04 

27 4226 

792 

62 72 64 

28 1425 

753 

56 70 09 

27 4408 

793 

62 88 49 

28 1603 

754 

56 85 16 

27 4591 

794 

63 04 36 

28 1780 

755 

57 00 25 

27 4773 

795 

63 20 25 

28 1957 

756 

57 15 36 

27 4955 

796 

63 36 16 

2S 2135 

757 

57 30 49 

27 5136 

797 

63 52 09 

28 2312 

758 

57 45 64 

27 5318 

798 

63 68 04 

28 2489 

759 

57 60 81 

27 5500 

799 

63 84 01 

28 2666 

760 

57 76 00 

27 5681 

800 

64 00 00 

28 2843 


428 


Appendix 


Table K (continued) 


Number 

Square 

Square root | 

Number 

Square 

Square root 

801 

64 16 01 

2S 3019 

841 

70 72 81 

29 0000 

802 

64 32 04 

2S 3196 

842 

70 89 64 

29 0172 

803 

64 48 09 

28 mi 

00 

71 06 49 

29 0345 

804 

64 64 16 

28 3049 

844 

71 23 36 

29 0517 

805 

64 80 25 

28 3725 

845 

71 40 25 

29 0689 

806 

64 96 36 

28 3901 

846 

71 57 16 

29 0861 

807 

65 12 49 

28 4077 

847 

71 74 09 

29 1033 

808 

65 28 (Y 

28 4253 

848 

71 91 04 

29 1204 

809 

65 44 81 

28 4429 

849 

72 08 01 

29 1376 

810 

65 61 00 

28 4605 

850 

72 25 00 

29 1548 

811 

65 77 21 

28 4781 

851 

72 42 01 

29 1719 

812 

65 93 44 

28 4956 

852 

72 59 04 

29 1890 

813 

66 09 69 

28 5132 

853 

72 76 09 

29 2062 

814 

66 25 96 

28 5307 

854 

72 93 16 

29 2233 

815 

((41 25 

28 5482 

855 

73 10 25 

29 2404 

816 

66 58 56 

28 5657 

856 

73 27 36 

29 2575 

817 

1 66 74 89 

28 58*2 

857 

73 44 49 

29 2746 

818 

! 66 91 24 

28 6007 

858 

73 61 64 

•29 2916 

819 

( 7 07 61 

28 60S2 

859 

73 78 81 

29 3087 

820 

67 24 00 

28 6356 

860 

73 96 00 

29 3258 

821 

67 40 41 

28 6 e 31 

861 

74 13 21 

29 3428 

822 

67 56 84 

28 6705 

862 

74 30 44 

29 3598 

823 

67 n 29 

28 6880 

863 

74 47 69 

29 3769 

824 

67 89 76 

28 7054 

864 

74 64 96 

29 3939 

825 

68 06 25 

28 7228 

865 

74 82 25 

29 4109 

826 

68 22 76 

28 7402 

866 

74 99 56 

29 4279 

827 

68 39 29 

28 7576 

867 

75 16 89 

29 4449 

828 

68 55 84 

28 7750 

868 

75 34 24 

29 4618 

82° 

68 72 41 

28 7924 

869 

75 51 61 

29 4788 

830 

68 89 00 

28 8097 

870 

75 69 00 

29 4958 

831 

69 05 61 

28 8271 

871 

75 86 41 

29 5127 

832 

69 22 24 

28 8444 

872 

76 03 84 

29 5296 

833 

69 38 89 

28 8617 

873 

76 21 29 

29 5466 

834 

69 55 56 

28 8791 

874 

76 38 76 

29 5635 

835 

69 72 25 

28 8964 

875 

76 56 25 

29 5804 

836 

69 8<n 96 

28 9137 

876 

76 73 76 

29 5973 

837 

70 05 69 

28 9310 

877 

76 91 29 

29 6142 

838 

70 22 44 

28 9482 

878 

77 08 84 

29 6311 

839 

70 39 21 

28 9655 

879 

77 26 41 

29 6479 

840 

70 56 00 

28 9828 

880 

77 44 00 

29 6648 










Appendix 


Table K (continued) 


Number 

Square 

Square root 

J Number 

Square 

881 

77 61 61 

29 ( -.k. 

921 

84 82 41 

88? 

77 79 24 

29 ( l >\5 

922 

S3 uo S4 

883 

77 96 89 

29 7153 

92 j 

S-> 19 2 ) 

8K4 

78 14 56 

29 7121 

924 

76 

885 

78 32 25 , 

29 *'4S9 

92 

b 3 56 25 

886 

78 49 96 

29 7t>58 

926 

8" '4 7(i 

887 

78 67 69 

29 7825 

027 

85 93 2 > 

888 

78 85 44 

29 7 >9^ 

92 S 

8f 1 1 S4 

889 

79 0? 21 

29 SI 61 

929 

R( '0 41 

890 

79 21 00 

29 8! 29 

930 

8t> 49 00 

891 

79 38 SI 

29 S49 () 

f, 3 1 

s# 67(1 

892 

79 56 ( 4 

29 8a 4 

9^ 

V 86 2 4 

893 

79 74 19 

29 S 8 1 1 

9m 

S" 04 89 

891 

79 9 2 ^6 

29 S99S 

9 4 

87 23 s6 

895 

80 10 25 

29 91(6 

935 

S' 7 42 23 

8% 

80 28 16 

29 93 n 

936 

87 60 9(- 

897 

80 4 6 09 

29 9500 

937 

87 79 69 

898 

80 64 04 

2 > 9666 

93 S 

87 98 41 

899 

80 s2 01 

2) >S 1 

9?9 

s r: 1 


81 00 00 

30 0(XXJ 

940 

88 36 00 

901 

81 1801 

30 0 U7 

941 

88 54 81 

902 

81 36 04 

30 033? 

942 

88 “'3 64 

903 

81 54 09 

30 0500 

943 

88 92 49 

904 

81 72 1<> 

30 0666 

944 

89 11 3o 

90' 

81 90 2^ 

30 0832 

945 

89 30 25 

906 

82 08 ?6 

30 099$ 

946 

89 49 16 

907 

82 26 49 

30 1164 

947 

89 68 09 

908 

82 44 64 

30 1330 

94S 

89 87 04 

909 

82 62 81 

30 1496 

949 

90 06 01 

910 

82 81 00 

30 1662 

950 

9025 00 

911 

82 99 21 

30 1828 

951 

90 44 01 

912 

83 17 44 

30 1993 

952 

90 63 04 

913 

83 35 69 

30 2159 

953 

90 82 09 

914 

83 53 96 

30 2324 

954 

91 01 16 

915 

83 72 2:> 

30 2490 

955 

91 20 23 

916 

83 90 56 

30 2655 

956 

91 39 36 

917 

84 08 89 

30 2820 

957 

91 58 49 

918 

84 27 24 

30 2985 

958 

91 77 64 

919 

84 45 61 

30 3150 

959 

91 96 81 

920 

84 64 00 

30 3315 


92 16 00 


429 


Square root 

30 3480 
30 <645 
30 3809 

19-4 

30 413$ 
^0 4302 
30 4467 
30 4631 
30 4795 
30 49^9 

30 5123 
30 5287 
30 54 "0 
>0 5611 
i0 5778 
30 5941 
30 610-* 
>0 268 
30 6431 
30 6594 

30 6757 
30 6920 
30 7083 
30 7246 
30 7409 
30 7571 
30 7734 
30 7896 
30 8038 
30 8221 

30 8383 
30 8545 
30 8707 
30 8869 
30 9031 
30 9192 
30 9354 
30 9516 
30 9677 
30 9839 













430 


Appendix 




Square root , Number 

n 0000 9^1 9( 23 61 

310161 9S2 9643 24 

31 0 Ml I %> 96 62 89 

31 0483 944 96 82 56 

310644 I 985 97 02 25 

31 CS05 I 986 97 21 96 

310966 %7 97 41 (9 

31 1127 1 988 9* T 61 44 

31 128S ' 989 97 81 21 

31 144S | 990 98 01 00 

31 1<()> 991 98 2081 

31 P69 I 992 98 40 64 

31 1922 991 9-<o0 49 

31 2090 994 98 SO ?6 

31 2250 1 995 9 ) 00 25 

31 2410 

31 2570 j 997 99 40 02 

31 2730 998 99 fO 04 

31 2890 999 99 SO 01 

31 3050 1000 100 00 00 









Index 


Absolute zero, 14 

Addition theorem of probability, S4-S/3 
Ape allow ancca, 267-268 
Aitken, A. <\, 396, 434 
Vitken's numerical solution, ,197- 400 
Alternative hypothesis, 163, 10.)- 167 
Anal} sis of covariance, 326 339 
adjusting sum of squares, 330-332 
computation, 332 337 
degrees of freedom, 332 
extended use of, 339 
notation, 327 328 
partitioning a sum of products, 328 
329 

regression lines in, 329-330 
in testing homogeneity of regression 
coefficients, 337-339 
trend analysis, 342-353 
variance estimates, 332 
Analysis of variance 281-323 
assumptions underlying, 294-295 
ehoice-of-error term, 310-311 
classification, higher, 323 
one-way, 281-297 
two-way, 300-323 

comparison of means following an F 
test, 295-297 

computation, one-way classification, 
289-293 

two-way classification, 312-319 
unequal numbers in subclasses, 319- 
323 

covariance method (see Analysis of 
covariance) 

degrees of freedom, 286-287, 304-305 


Analysis of variance, F ratio in, 2sS, 293 
297, 304-305, 310 312, 319 
interaction, nature of, 305 307 
mean square (arc variance estimate 
below) 

models, finite, random, lived, and 
mixed, 307 311 

multiple eouumriHons, 295 297 
notation, 283 284, 301 302 
null hypothesis in, 28s, 308 
by ranks, correlated samples, 363 365 
independent samples, 362 363 
sum of squares, between groups, 2S5 
286 

within groups, 285-286 
for interaction, 300- 304, 305 307 
partitioning, 285 286, 302 304 
pooling, 311 312 
for two groups, 293 294 
w'lth unequal numbers m subclasses, 
319 323 

variance estimate, expectation, 286 
287, 307-310 

meaning of, 287-288, 307 310 
one-way classification, 286-287 
♦wo-way classification, 304-305 
Arbitrary origin, 48-49 
Arithmetic mean ( see Mean) 

Aspen, Alice A., 173, 434 
Assumed mean, 49-50 
Attenuation, 382-383 
Auble, IX, 360, 434 
Average, 45-58 

(See also Mean; Median) 

Average (mean) deviation, 62-63 



438 


Index 


Barlow’s Tables , IS 
Beta coefficient, 292-293 
Biased estimate, 64 
Bibliography, 434 436 
Bimodal distribution, 41, 50 
Binder, A., 312, 434 
Binet, Alfred, 3 

Binomial distribution, 39, 87 9 5 
goodness of fit, 100 107 
and hypothesis testing, 02 03 
kurtoHis of, 0(1 01 
limiting form, 06 
mean of, 00 01 

related to normal curve, 00 07 
skewness of, 00-01 
variance of, 00 01 
Biserial correlation, ‘242-244 
Bivariate distribution, 112 11 1, 122 

('hang? of origin, 48 40 
(’hi square (x 2 ), 102 213 

applied in, analysis of vananee by 
ranks, 305 

contingency tables, 200 204 
rank test, for / correlated samples, 
305 

for k independent samples 302 
303 

sign test, for k independent samph s, 
357-358 

for two correlated samples, 350 
357 

for two independent samples, 
355-350 

computation, combining frequencies 
in, 107-108 

and contingency coefficient, 235 
correction for continuity, 200-2 OS 
critical values, table, 407 
defined, 192 

degrees of freedom, 104-195, 100, 203, 
208, 210-212, 356-358, 303, 365 
distribution, 193-195 
formulas for, 192, 204, 206-207, 211, 
235 

for fourfold table, 204, 207 
one- and two-tailed tests, 210-211 
relation of, contingency coefficient, 235 
normal deviate, 205, 357 
phi coefficient, 237 


Chi square (x 2 ), relation of, sample size, 
211 

sampling distribution, 193-195 
small expected frequencies, 206-208 
in test of, coefficient of concordance, 
228 

coefficient of consistence, 231 232 
difference between proportions, 204 
206 

goodness of fit, 195 200 
independence, 200 204 
unequal and disproportionate fre- 
quencies, 319 321 
Class boundaries, 30 
Class interval, 28 31 

conventions regarding, 28-29 
defined, 28 

distribution of observations within, 31 
exart limits, 29 30 
mid-point, 31 

( lassification variables, 271, 278 279 
Clelland, ltiehard <\, 355, 436 
Cochran, W C, , 171, 271, 280, 434 
Coefficient, of concordance, 225 228 
formula foi, 226 
related to rho, 226 227 
significance, 227 228 
with lied raiik^, 227 
of consist! nee, 228- 232 
formula for, 230 
significance, 231 

of orthogonal polj normals, table of, 4 7 
(»S \er also Contingency coefficient; 
Correlation coefficient, Bln coeffi- 
cient, Reliability coefficient) 
Combinations, 85- 87 
Complete factorial experiment, 276 
Comrie, L. J , 18, 434 
Concordance (.sec Coefficient, of concord- 
ance! 

Condensation, pivotal, 397-400 
Confidence interval, 150-153, 157-159 
for c orrelation coefficient, 186 
for means of large samples, 151-153 
for means of small samples, 157-158 
for median, 158-159 
for proportion, 158 
for standard deviation, 158-159 
Consistence ( see Coefficient, of consist- 
ence) 



Index 


439 


Consistent estimate, 150-151 
Constant, definition of, 11 
Constant process, 3 
Contingency coefficient, 106, 234-236 
and chi square, 235 
maximum value, 235-236 
significance, 236 
Contingency table, 201-202 
Continuity, correction for, 206-207, 223- 
225 

Coombs, C. H., 14, 436 
Cornell, Francis (J., 194n , 434 
Correction, for attenuation, 382-383 
for continuity, 206-207, 223-225 
for grouping, 69 70, 174 
Correlation, 105 130, 216-232, 234-249, 
388-412 

measures of, biserial, 242-244 
concordance, 225- 228 
contingency coefficient, 234-236 
Kendall’s tau, 220 225 
multiple (see Multiple correlation) 
paitial, 388-390 
phi coefficient, 236-239 
point biserial, 239-242 
product-moment (see Product- 
moment correlation) 
rank (see Hank correlation) 
Spearman’s rho, 216-220 
tetrachoric, 234, 244-246 
and prediction, 118-130 
ratios, 246-249 
and regression,. 1 22-125 
and standard score, 108-1 10 
of sums, 390-392 
t ratio for, 187, 189, 220, 242 
between true scores, 382 -383 
variance interpretation, 126-128 
Correlation coefficient, confidence inter- 
val for, 186 

critical values, table, 413 
effect of measurement error on, 382- 
383 

for multiple correlation, 393, 396 
sampling distribution (see Sampling 
distribution) 
significance, 186-187 

of difference, correlated samples, 
188-189 

independent samples, 187-188 


Correlation coefficient, standard error, 
184-186 

tetrachoric, significance test, 246 
Correlation ratios, eta (iy), 234, 246-249 
related to r, 248 
significance, 248-249 
Correlational investigations, 16 
Cosine-pi coefficient, 244-245 
Covariance, 115 

Covariance analysis (see Analysis of 
covariance) 

Cox, 1). R., 280, 434 
Cox, (J. M., 171, 274, 280, 434 
Cronhach, L. J., 17, 386, 434 
Cumulative distribution, 32 


Darwin, Charles, 6 7 
Davis, R. L., 14, 436 
Decile point, 256 

Degrees of freedom, in analysis of covari- 
ance (see Analysis of covariance) 
in analysis of variance («ec Analysis 
of variance) 

for chi square, 194-195, 199, 203, 208, 
210-212, 356-358, 363, 365 
for contingency tables, 200-204 
for F, 286-287, 291, 296, 304, 332, 343, 
348, 352 

geometric interpretation, lo7 
meaning, 65, 156-157 
in multiple comparisons, 296 
for t, 155, 167-168, 170-173, 184, 187, 
189, 220, 242, 293, 390 
1 )elta scores, 267 
Descriptive statistics, 9 
Design of experiments, 270- 280 
complete factorial experiments, 276 
factorial experiments, 275-277 
Latin square, 278 
randomization, 274-275 
randomized block, 277-278 
single-factor experiments, 273-274 
terminology, 271-272 
Deviation, average (mean), 62-63 
standard (see Standard deviation) 
Difference (see Significance test, of dif- 
ference; Standard error, of differ- 
ence; t ratio, for difference) 



440 


Index 


Directional test, 165-167 
Distribution, bimodal, 41, 56 
binomial ( see Binomial distribution) 
bivariate, 112-114, 122 
chi square, 193-195 
cumulative, 32 
F, 182-183 
frequency, 24-43 
graphic representation, 33 -38 
./-shaped, 41 
leptokurtic, 39, 43 
mesokurtic, 39 
normal, 95-103 
platykurtic, 39 
properties of, 38-43 
rank, 26 

rectangular, 41, 251 

sampling (see Sampling distribution) 

skewed, 39- 41 

t } 153- 156 

f/-shaped, 41 

Distribution-free tests (see Non para- 
metric tests) 

Doolittle method, 396 
Duncan, D. B., 296, 434 


Edwards, Allen L., 245, 297, 41 2n., 434 
Efficiency of estimate, 150-151, 355 
Error, of estimate, 125-126 
grouping, 69-70, 174 
of measurement, 373 386 

effect on correlation coefficient. 382 
383 

effect on mean, 375-376 
effect on sampling variance of mean, 
381-382 

effect on variance, 375-376 
random, 373 

standard deviation, 383-385 
systematic, 373 
Type I, 163-165 
Type II, 163- 165 
Estimate, 132, 150-151 
biased, 64 
consistent, 150-151 
efficient, 150-151, 355 
error of, 125-126 
interval, 150 
meaning, 10, 150-151 


Estimate, point, 150 
relative efficiency of, 150-151, 355 
sufficient, 150-151 
unbiased, 150-151 
Estimation, 150-159 
Eta ( 17 ) (see Correlation ratios) 

Exact test of significance for fourfold 
table, 208-210 

Expected value, 150, 287-288, 307-311, 
358-359 

Experimental design (see Design of 
experiments) 


F ratio, 182-183 

in analysis of variance, 288, 293-297, 
304-305, 310-312, 319 
bias in, 31 1-312, 322-323 
critical values, table, 408-411 
related to f, 293- 294 
in test of, correlation ratio, 248-249 
difference between variances, 181 
183 

homogeneity of regression, 338 
linearity of regression, 248 
multiple correlation coefficient, 401 
Factor analysis, 3 
Factorial experiments, 275-277 
Fcchner, (Justav, 2 
Ferguson, (i. A., 25n., 113n., 1 19n., 

142n , 369, 379-380, 386, 434, 435 
Finney, 1). J , 3, 210, 280, 434 
Fisher, It. A., 7, 61, 133, 186, 208, 275, 
278, 281, 297, 347, 406n.-407n., 

41 3n., 434 

Fisher's z r transformation, 186, 188, 254 
Fishman, Joshua A., 246, 434 
Fit, goodness of, 195-200 
Fitting of line, 118-122 
Fourfold point correlation (see Phi coeffi- 
cient) 

Fourfold table, exact test of significance 
for, 208-210 
Frequency, 28 
comparison (see Ch‘ square) 
distribution, 24-43 
(See also Distribution) 
observed, 192 
polygon, 31, 36-37 
cumulative, 37-38 



Index 


441 


Frequency, theoretical, 192 
Freund, John E., 159, 434 
Friedman, M., 363, 365, 434 

two-way analysis of variance by ranks, 
363-365 

Fryer, H. C., 62, 434 
Function, meaning of, 95- 97 


Galton, Francis, 7, 105, 106 
Geometric mean, 45 
Glossary of symbols, 431 433 
Goodness of fit, 195-200 
Gosset, W. S., 155 
Gourlay, Neil, 322-323, 434 
Graphs, 33-38 
Gronow, D. G. f\, 173, 434 
Grouping error, 69-70, 174 
effect, on mean, 69-70, 174 
on variance, 69- 70 
and sampling statistics, J74 
Sheppard’s correction for, 70 
Guilford, J. P., 245, 249, 386, 435 
Gulhksen, IL, 378, 386, 435 


H test, one-way analysis of variance by 
ranks, 362 
Harmonic mean, 45 
Histogram, 34-35 

Homogeneity, of regression, 337 -339 
of variance, 168, 181, 294 
Hoinoscedasticity (.see Homogeneity, of 
variance) 

Hypothesis, alternative, 163, 165-167 
null, 162-163 

Hypothesis testing (see Significance) 


Independence tests, 200-204 
Inference, statistical, 1-2, 8-10, 132 
Integers, first N, in nonparametric tests, 
358-371 

in rank correlation, 216-232 
standard deviation, 71-72 
sum, 21-22 
sum of squares, 71 
Interaction, 305-307 
Interval, confidence, 150-153, 157-159 


Interval, equality, 13 
estimate, 150 

grouping (see Class interval) 
variable, 13, 272-273 
Invariance, 254 


/-shaped distribution, 41 

Jackson, H. W. B., 25n., 34, 1 13n., I19n. 

142n., 386, 435 
Jenkins, W. L., 246 

Johnson, Palmer O., 34, 159, 196, 197w., 
294, 435 


Kaiser, Henry F., 166, 433 
Keeping, E. S., 33n., 159, 322, 347, 435 
Kempthorne, ()., 323, 436 
Kendall, M. G., 220, 222, 224, 227, 

41 5n., 435 

Kendall’s coefficient, of concordance, 
225 228 

of consistence, 228 -232 
Kendall’s tau, 220-225, 365-369 
significance, table for testing, 415 
Kenney, John F., 33n., 159, 322, 435 
Kruskal, W. H., 362, 435 
Kruskal-Wallis one-way analysis of 
variance by ranks, 362-363 
Kuder, G. F , 379, 435 
Kuder-Kichardson formulas. 379-380, 
385-386 

Kurtosis, 39, 43, 76 
of binomial distribution, 90- 91 


Lacey, John I., 267, 435 

Large sample statistics, 153, 156 

Latin squares, 278 

Least -squares method, 53, 118-119 

Leptokurtic distribution, 39, 43 

Lev, Joseph, 189, 244, 323, 436 

Levels ot significance, 164-165 

Lewis, I)., 154n 

Lindquist, K. F., 294. 435 

Lindzey, Gardner, 62, 435 

Line, fitting of, 118-122 

Linear regression, 118-125, 342-346 

Logarithmic transformation, 254 

Lord, Frederic M., 378, 384-386, 4J5 



442 Index 


MacMeeken, A. M., 123 n., 435 
McNeinar, Quinn, 179, 197, 198n.-199n., 
274, 323, 435 
Mann, H. B., 360, 435 
Mann-Whitney U test, 358-360 
Mean, arithmetic, 45-58 
of combined groups, 52 
defined, 45-46 
formulas for, 4G- 47, 50 
properties, 52-53 

related to median and mode, 57-58 
sampling distribution, 138-144 
assumed, 49- 50 
geometric, 45 
harmonic, 45 
Mean deviation, 62- 63 
Mean square (see Analysis of variance, 
variance estimate) 

Measurement, error of (see Error, of 
measurement) 

Median, 54-56 

confidence interval for, 158-159 
standard error of, 158 
Mendel, Abb£, 195 
MeHokurtic distribution, 39 
Mode, 56 

Models in analysis of variance, 307 31 L 

Moments, 75-76 

Monotonic functions, 365-371 

Mono tonic trend analysis, 365-367 

Muller, G. E., 3 

Multiple comparisons, 295- 297 

Multiple correlation, 388-402 

Aitken’s numerical solution, 397-400 
coefficient, 393, 396, 401 
Doolittle method, 396 
geometry of multiple regression, 394- 
396 

interpretation, 401 

with more than three variables, 396- 
400 

regression equations, 394, 396 
sampling error, 401 
with three variables, 390-393 
Multiple regression, 390-402 
Multiplication theorem of probability, 
84-85 

Nair, K. R., 159, 435 
Nondirectional test, 165-167 


Nonlinear regression, 128-129, 346-351 
Nonlinearity test, 248, 346-351 
Non parametric tests, 15, 174, 254, 354- 
371 

Mann-Whitney U y 358-360 
monotonic trend, for correlated sam- 
ples, 367-369 

for independent samples, 365-367 
rank, 358-371 

for k correlated samples, 363-365 
nr k independent samples, 362 363 
for two correlated samples, 360-361 
for two independent samples, 358- 
360 

sign, for k independent samples, 357 
358 

for two correlated samples, 356-357 
for two independent samples. 355- 
356 

of significance, 354- 371 
for variation m independent samples, 
369-371 

Normal distribution curve. 95-103 
as approximation to Binomial, 96- 97, 
102-103 

aiea under, 99-101 
formula for, 97-98 
goodness of fit to, 197-200 
ordinates, 97-99 
standard-score form, 98 
summary of properties, 103 
table of ordinates and areas, 404-405 
transformation to, 262 265 
Norms, 253 

Null hypothesis, in analysis of variance, 
288, 308 

meaning, 162-163 


Olds, E. G., 414n. 

One- and two-tailed tests, 165-167 
for chi square, 210-211 
Ordinates of normal curve, 97-99 
table of, 404-405 
Origin, arbitrary, 49 
change of, 48-49 
Orthogonal comparisons, 295 
Orthogonal polynomials, 343, 346-352 
table of coefficients, 417 



Index 


443 


Paired-comparisons method, 228-2.31 
Parallel tests, 377 
Parameter, 10, 132 
Partial correlation, 388-390 
Pascal’s triangle, 90 
Pearson, Karl, 7, 106 
Percentage, standard error of, 158 
Percentiles, 256-262 
Permutations, 85-86 
Phi coefficient, 234, 236-239 
effect of marginal totals on, 238-239 
related to chi square, 237 
standard error, 239 
Pivotal condensation, 397-400 
Platykurtic distribution, 39 
Point biserial correlation, 239- 242 
Point estimate, 150 
Polynomials, orthogonal, 343, 346-352 
Population, defined, 4, 132 
finite. 5, 138-141 
infinite, 5, 141-144 
numerical properties, 6 
Prediction, 117-130 
errors, 125-126 
meaning, 117 

in relation to correlation, 122-125 
Probability, 80-93 

addition theorem, 84 85 
and binomial, 87-90 
conditional, 83-84 
exact, 208 
joint, 83-84 

multiplication theorem, 84-85 
nature, 80-82 
Probits, method of, 3 
Product-moment correlation, 105-130 
assumptions undirlying, 128-130 
computation, 110-112 
critical values, table, 413 
definition, 108-110 
direction, 109-110 
related to regression, 122-125 
sampling error, 184-186 
variance interpretation, 126-128 
Proportion, significance of difference, 
correlated samples, 178-181 
independent samples, 176-178, 204- 
206 

standard error of, 144-146, 158 
Psychophysics, 2-3 


Random, meaning of, 132-133 
Randomization, 274-275 
Randomized block experiment, 277-278 
Randomly parallel tests, 384-385 
Range, 62 

Rank correlation, 106, 216-232 
Kendall’s tau, 220-225 
significance, 222-224 
with tied ranks, 221-222 
Spearman’s rho, 216-220 
significance, 219-220 
with tied ranks, 218-219 
Rank distribution, 26 
Rank tests of significance, 358-371 
Rank transformation, 254 
Rectangular distribution, 41, 251 
Regression, in analysis of covariance, 
329-330 

bivariate distribution, 122 
equation, 121-124, 393-394, 396 
homogeneity of, 337-339 
linear, 118-125, 342-346 
meaning, 118-122 
multiple, 390-402 
nonlinear, 128-129, 346-351 
related to correlation, 122-125 
transformations, 266-267 
Reliability (see Error, of measurement ; 

Reliability coefficient) 

Reliability coefficient, and attenuation, 
382-383 

defined, 376-377 
for difference scores, 383 
effect of test length on, 380-381 
in experimental psychology, 385-386 
methods of determining, 377-380 
Richardson, M. W., 379, 435 
Ryan, T. A., 296, 435 


S, in definition of tau, 220-222 
sampling distribution, 222-223, 415 
significance of, 222-224 
standard error of, 222-224 
table of, 415 

in trend analysis, 365-369 
Sample, meaning of, 8- 10 
Sampling, 132-148 

proportional stratified sample, 134 
random, 132-133 



Index 


Sampling, stratified random sampling, 
134 

system a tie, 133-134 
unit, 300 

Sampling distribution, of chi square, 
193-195 

of correlation coefficient, 184-186 
bisenal, 243 
tetrachonc, 246 
of differences, 146-148 
experimental, 137 
of F, 181-183 

of mean, from finite population, 138 

141 

from indefinitely large population, 
141-144 

meaning, 137-138 
of pioportion, 144- 146 
of S , in the definition of tail, 222- 
223, 415 

of score or measurement, 383 385 
off, 153-156 
theoretical, 137-138 
Sampling error, meaning, 135-137 
of multiple correlation, 401 
of product -moment correlation, 184 
186 

Sampling theory, 132-147 
Scale, stanine, 265-266 
Scatter diagram, 107 
Scheftt, II , 2%, 436 

Scheflfa method of multiple companions, 
296- 297 
Set, 82-83 

Sheppard’s correction, 70 
Sigel, Sidnej, 210, 355, 365, 369, 436 
Sign tests, 355-356 
Significance, levels, 164 165 
meaning, 161 163 
non parametric tests, 354 371 
rank tests, 358-371 

Significance test, for bisenal correlation, 
243-244 

for coefficient, concordance, 227-228 
consistence, 231-232 
correlation, 186-187 
for correlation ratio, 248-249 
of difference, correlated proportions, 
178-181, 204-206 
correlated variances, 183-184 


Significance test, of difference, correla- 
tion coefficient, 187-189 
independent proportions, 176-178 
for means, of correlated samples, 
169-171 

of independent samples, 167-169 
under non-normality, 173-174 
where variances are unequal, 171— 
173 

for variances of independent sam- 
ples, 181-183 

directional and nondirectional, 165- 
167 

exart, for fourfold tables, 208-210 
of intcra< tion, 305-311 
of Kendall’s tau, 222-225 
for linear trend, 342-344 
for multiple correlation coefficient, 401 
for nonlinear trend, 346- 351 
for nonlinearity, 248 
one- and two-tailed, 165-167 
for phi coefficient, 239 
for point bisenal correlation, 242 
rank, 358 371 • 

for Spearman’s rho, 219-220 
for tetrarhoric correlation coefficient, 
244 246 

Single-factor experiments, 273-274 
Skewness, 39, 42 43, 76 

of binomial distribution, 90-91 
Slope of line, 119-121 
Small sample statistics, 153, 156 
Snedecor, (1 W , 322, 409n , 436 
Sorenson, II , 418n-429n 
Spearman-Brown formula, 378, 381 
Spearman’s rank coefficient, 216-220 
critical values, table, 414 
Square root transformation, 254 
Squares and square roots, table, 418-429 
Standard deviation, 66 
adding a constant, 70-71 
advantages, 74-75 
calculation, 67-69 
for combined groups, 72-73 
confidence interval, i.58-159 
effects of grouping on, 69-70 
of first N integers, 71-72 
of measurement error, 383-385 
multiplying by a constant, 70-71 
standard error, 159 



Index 


445 


Standard error, 137-138 
of biserial correlation coefficient, 243- 
244 

of correlation coefficient, 184-186 
of difference, 146-148 

for correlated proportions, 178-181 
for independent proportions, 176- 
178 

for means of independent samples 
147 

for Zr's (transformed r), 188 
of estimate, 125-126 
of mean, effect of grouping on, 174 
from finite population, 13 s 141 
fiom indefinitely large population, 
141 144 

meaning of, 138 
of measurement, 383-385 
of median, 158 
of percentage, 158 
of phi coefficient, 230 
of proportion, 144 146, 158 
of *8, m definition of tau, 222 224 
of standard deviation, 15*1 
of tetrachonc correlation coefficient, 
246 

of z, (transformed r), 186 
Standard score, and correlation, 108-110 
defined, 73 74 
sum of squares of, 74 
transformation, 255- 256 
Standardization of tests, 253 
Stan me scale, 265-266 
Statistical inference, 1 2, 8 10, 132 
Statistics, as study of population, 4 6 
as study of variation, 6-8, 61 
Stevens, S S , 12, 14, 436 
“Student,’' 155 
Sufficient estimate, 150-151 
Summation notation explained, 20-22 
Symbols, glossary of, 431- 433 


t distribution, 153-156 
t ratio, 153-156 
assumptions underlying, 354 
in comparison of means following F 
test, 296 

and confidence limits, 157-158 
for correlation, 187, 189, 220, 242 


t ratio, critical values, table, 406 
for difference, of correlated \anances, 
183 184 

of means, for correlated samples, 

170 

for independent samples, 168 
unequal vaiianees, 171-173 
for partial correlation, 390 
for point biserial correlation, 242 
related to F, 293-294 
for Spearman’s rho, 220 
T-soore transformation 262-265 
Tabular representation, rules, 32 33 
Tate, M W , 355, 436 
Tetrac boric coi relation, 234, 241 246 
Thomson, Cl H , 396, 39Sn , 436 
Thrall, R M , 14, 436 
Thurstone, L. L , 34 m , 436 
Tied ranks m coefficient of concordance, 
227 

in Kendall’s tau, 221 222 
in Spearman’s rho, 218-219 
Ties in rank test, for A inch pendent 
samples, 362 

foi two independent samples, 160 
Torgerson, Warren S., 12, 14, 136 
Transformation, 251-268 
with age allowances, 267-268 
Fisher’s z r , 186, 254 
loganthm ic, 254 
nature, 251 255 
to normal distribution, 262-265 
to percentile ranks, 256-262 
of r to z r , 186, 188 
rank, 254 

regression, 266-267 
square root, 254 
to standard scores, 255 256 
to stamnes, 265 266 
to T scores, 262-265 
Trend analysis, 342-353, 365 371 
correlated data, 352 
extended applications, 352 353 
linear trend, 342-346 
meaning of, 342 
monotonic, 365-367 
nonlinear trend, 346-353 
orthogonal polynomials, 343, 346-352 
partitioning sum of squares, for linear 
trend, 342-343 



446 


Index 


Trend nalysis, partitioning sum of 
squares, using orthogonal pol> no- 
rmals, 347-349 
polynomial regression, 346 
unequal n’s, 344-345, 351 352 
True score?, ana correlation, 382-383 
defined, 374 
variance, 376-377 
Tsao, Fei, 319, 436 
Tukey, J W , 369, 4 16 
Two-tailed test, 165 167 
Type 1 error, 163-163 
Type II error, 163- 165 


(/-shaped dist ibi ticn, 41 
(/ test, Mr in tvl ’tney, 3)8 360 
Unbiased * timatc 150-151 
of variance, 64 
Unit of >n jrenie i , 18-20 
Urban, F vl , 3 


Value, expected. 150, 287-288, 307 111, 
338-359 

Vanal 0, of cl nsifieat >n, 271, 278- 
2~9 

ct putation ">0 8-69 

continuous, 1" 
defined, 10 
dependent, 11 
discrete, 12, 20 
independent, 11, f 5 
uitervi 1, 13, 272-273 
nommai 12-13, 272-273 
ordinal, 1 > 13, 272 273 
qualitative, 14 
quantitative, 14 
ratio, 12, 14, 272-273 
types, 10-16 

Variance, additive nature, 114 
advantae ‘S, 74-75 
analysis see Analysis of variance) 
biased estimate of, 64 
of binomial distribution, 90-91 
calculation, 67-69 


Van icc, of coni Tuned gioups, 72-73 
defined, 64-65 
of <i* Terences, 114-115 
fife f grouping on, 60-70 
^ <ec >f measurement error on, 375- 
6 

estimate (see Ana^sis of variance, 
variance estimate) 
homogeneity, 168, 181, 294 
sampling (see Standard error, of dif- 
ference) 

significance of differ* rite, correlated 
srmples, 183-184 
indi endent samples, 181 183 
of sunit, 114 ll r > 
of true scoies, 176 377 
unbiased estimate, 64 
Variate, defined, 11 


Walker, Helen M , 189, 241, 121, 436 

Wallis, W A , 102, 435 

Weber, E H,2 

Welch, B L, 171 17.1, Tib 

Wert, J E , 404 n 

Whitney, I) H . 360, 438 

Wilcoxon, 1 , 360, 410r? 

Wilcoxon muichcd-pairs signed-ranks 
test, 360 361 

critical \ alues, tables, 416 
Wilk, M II , 323, 436 
Winer, B J , 280, 290, 339, 353, 436 
Woo, T L , 200, 436 


Yates, F , 133, 207, 275, 278, 347, 40Cn - 
407n , 413n , 434 

Yates’s correction for continuity, 207, 
356-357 


z score (see Standard score) 

2 r transformation, 186, 188, 254 
table transforming r to 2 r , 412 
Zero, absolute, 14 



